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GENETIC CONTROL OF FLOWERING 
This invention relates to the genetic contrci of 
flowering m plants and the cloning and expression of 
genes involved therein. More particularly, the 
5 invention relates to the cloning and expression of the 
CONSTANS (CO) gene of Arabidopsis thaliana, and 
homologies from other species, including Brassica napus 
and manipulation and use of the gene in plants. 

Efficient flowering in plants is important, 

10 particularly when the intended product is the flower or 
the seed produced therefrom. One aspect of this is the 
timing of flowering: advancing or retarding the onset of 
flowering can be useful to farmers and seed producers. 
An understanding of the genetic mechanisms which 

15 influence flowering provides a means for altering the 

flowering characteristics of the target plant. Species 
for which flowering is important to crop production are 
numerous, essentially all crops which are grown from 
seed, with important examples being the cereals, rice 

2 0 and maize, probably the most agronomically important in 
warmer climatic zones, and wheat, barley, oats and rye 
in more temperate climates. Important seed products are 
oil seed rape and canola, sugar beet, maize, sunflower, 
soyabean and sorghum. Many crops which are harvested 

25 for their roots are, of course, grown annually from seed 
and the production of seed of any kind is very dependent 
upon the ability of the plant to flower, to be 
pollinated and to set seed. In horticulture, control of 
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the ciming of flowering is important. Horticultural 
plants whose flowering may be controlled include 
lettuce, endive and vegetable brassicas including 
cabbage, broccoli and cauliflower, and carnations and 
5 geraniums . 

Arabidopsis zhaliana is a facultative long day 
plant, flowering early under long days and late under 
short days. Because it has a small, well-characterized 
genome, is relatively easily transformed and regenerated 

10 and has a rapid growing cycle, Ajrajbidopsis is an ideal 
model plant in which to study flowering and its control. 

We have discovered that one of the genes required 
for this response to photoperiod is the CONSTANS or CO 
gene, also called FG . We have found that plants 

15 carrying mutations of this gene flower later than their 
wild- types under long days but at the same time under 
short days, and we conclude, therefore, that the CO gene 
product is involved in the promotion of flowering under 
long days . 

20 Putterill et al, Mol . Gen. Genet. 239: 145-157 

(1993) describes preliminary cloning work which involved 
chromosome walking with yeast artificial chromosome 
(YAC) libraries and isolation of 1700kb of contiguous 
DNA on chromosome 5 of Arahidopsxs , including a 3 00kb 
25 region containing the gene CO. That work fell short of 
cloning and identification of the CO gene. 

We have now cloned and sequenced the CO gene 
(Putterill et al . , 1995), which is provided herein. 
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Unexpected difficulties and cornpl icat ions were 
encountered which made the cloning harder than 
anticipated, as is discussed belcw in the experimental 
sect ion . 

5 According to a first aspect of the present 

invention there is provided a nucleic acid molecule 
comprising a nucleotide sequence encoding a polypeptide 
with CO function. Those skilled m the art will 
appreciate that M CO function" may be used to refer to 

10 the ability to influence the timing of flowering 

phenotypicaliy like the CO gene of Arabldopsis thalxana 
(the timing being substantially unaffected by 
vernalisation) , especially the ability to complement a 
co mutation m Arabidopsis nhaliana , or the co phenotype 

15 in another species. CO mutants exhibit delayed 

flowering under long days, the timing of flowering being 
substantially unaffected by vernalisation (see, for 
example, Korneef et al . (1991) ) . 

Nucleic acid according to the present invention may 

2 0 have the sequence of a CO gene of AjraJbidopsis challasia , 
or be a mutant, derivative or allele of the sequence 
provided. Preferred mutants, derivatives and alleles 
are those which encode a protein which retains a 
functional characteristic of the protein encoded by the 

25 wild- type gene, especially the ability to promote 

flowering as discussed herein. Other preferred mutants, 
derivatives and alleles encode a protein which delays 
flowering compared to wild-type or a gene with the 
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sequence provided. Changes to a sequence, to produce a 
mutant or derivative, may be by one or more of addition 
insertion, deletion or substitution of one or more 
nucleotides in the nucleic acid, leading to the 
5 addition, insertion, deletion or substitution of one or 
more amino acids in the encoded polypeptide. Of course 
changes to the nucleic acid which -make no difference to 
the encoded amino acid sequence are included. 

A preferred nucleic acid sequence for a CO gene is 

10 shown in Figure 1, along with the encoded amino acid 
sequence of a polypeptide which has CO function. 

The present invention also provides a vector which 
comprises nucleic acid with any one of the provided 
sequences, preferably a vector from which polypeptide 

15 encoded by the nucleic acid sequence can be expressed. 
The vector is preferably suitable for transformation 
into a plant cell. The invention further encompasses a 
host cell transformed with such a vector, especially a 
plant cell. Thus, a host cell, such as a plant cell, 

20 comprising nucleic acid according to the present 

invention is provided. Within the cell, the nucleic 
acid may be incorporated within the chromosome. There 
may be more than one heterologous nucleotide sequence 
per haploid genome. This, for example, enables 

25 increased expression of the gene product compared with 
endogenous levels, as discussed below. 

A vector comprising nucleic acid according to the 
present invention need not include a promoter or other 



# 
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regulatory sequence, particularly if t he vector is to be 
used to introduce the nucleic acid into cells for 
recombination into the genome. 

Nucleic acid molecules and vectors according to the 
5 present invention may be provided isolated from their 
natural environment, in substantially pure or 
homogeneous form, or free or substantially free of 
nucleic acid or genes of the species of interest or 
origin other than the sequence encoding a polypeptide 

10 able to influence flowering, eg in Ara±>idopsis thaliana 
nucleic acid other than the CO sequence. 

Nucleic acid may of course be double- or single- 
stranded, cDNA or genomic DNA, RNA, wholly or partially 
synthetic, as appropriate. 

15 The present invention also encompasses the 

expression product of any of the nucleic acid sequences 
disclosed and methods of making the expression product 
by expression from encoding nucleic acid therefor under 
suitable conditions in suitable host cells. Those 

20 skilled in the art are well able to construct vectors 
and design protocols for expression and recovery of 
produces of recombinant gene expression. Suitable 
vectors can be chosen or constructed, containing 
appropriate regulatory sequences, including promoter 

25 sequences, terminator fragments, polyadenylation 

sequences, enhancer sequences, marker genes and other 
sequences as appropriate. For further details see, for 
example, Molecular Cloning: a Laboratory Manual: 2nd 
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edition, Sambrook et al, 1989, Cold Spring Harbor 
Laboratory Press. Transformation procedures depend on 
the host used, but are well known. 

The present invention further encompasses a plant 
5 comprising a plant cell comprising nucleic acid 

according to the present invention, and selfed or hybrid 
progeny and any descendant of such a plant, also any 
part or propagule of such a plant, progeny or 
descendant, including seed. 

10 A further aspect of the present invention provides 

a method of identifying and cloning CO homologrues from 
plant species other than Ara±>idopsis thaliana which 
method employs a nucleotide sequence derived from that 
shown in Figure 1 . The genes whose sequences are shown 

15 in Figure 5 and Figure 6 were cloned in this way. 

Sequences derived from these may themselves be used in 
identifying and in cloning other sequences . The 
nucleotide sequence information provided herein, or any 
part thereof, may be used in a data-base search to find 

20 homologous sequences, expression products of which can 
be tested for ability to influence a flowering 
characteristic. These may have "CO function" or the 
ability to complement a mutant phenotype, which 
phenotype is delayed flowering (especially under long 

25 days) , preferably the timing of flowering being 

substantially unaffected by vernalisation, as disclosed 
herein. Alternatively, nucleic acid libraries may be 
screened using techniques well known to those skilled in 
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the art and homologous sequences thereby identified then 
tested . 

The present invention also extends to nucleic acid 
encoding a CO homologue obtained using a nucleotide 
5 sequence derived from that shown in Figure 1 . CO 

homologue sequences are shown m Figures 5 and 6 . Also 
encompassed by the invention is nucleic acid encoding a 
CO homologue obtained using a nucleotide sequence 
derived from a sequence shown in Figure 5 or Figure 6 . 
10 The CO protein contains an arrangement of cysteines 

at the amino end of the protein that is characteristic 
of zinc fingers, such as those contained within the GATA 
transcription factors (discussed by Ramain et al, 1993; 
Sanchez -Garcia and Rabbitts, 1994). Seven independently 
15 isolated co mutants have been described, and we have 

identified the sequence changes causing a reduction in 
CO activity in all seven cases. Five of them have 
alterations within regions proposed from their sequence 
to form zinc fingers, and the other two have changes in 
20 adjacent amino acids at the carboxy terminus of the 

protein. The positions of these alterations support our 
interpretation that CO encodes a protein containing zinc 
fingers that probably binds DNA and acts as a 
transcription factor . 
25 T he provision of sequence information for the CO 

gene of Arabidopsis thaliana enables the obtention of 
homologous sequences from other plant species. In 
Southern hybridization experiments a probe containing 
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the CO gene of Arabidopsis thaliana hybridises to DNA 
extracted from Brassica nigra, Brassica napus and 
Brassica oleraceae . Different varieties of these 
species display restriction fragment length 
5 polymorphisms when their DNA is cleaved with a 

restriction enzyme and hybridised to a CO probe. These 
RFLPs may then be used to map the CO gene relative to 
other RFLPs of known position. In this way for example, 
three CO gene homologies were mapped to linkage groups 
10 N5, N2 and N12 of Brassica napus (D. Lydiate, 

unpublished) . The populations used for RFLP mapping had 
previously been scored for flowering time and it was 
demonstrated that particular alleles of the CO 
homologues segregated together with allelic variations 
15 affecting flowering time. The loci mapped to linkage 
groups N2 and N12 showed the most extreme allelic 
variation for flowering time . 

Successful cloning of two Brassica napus homologues 
is described in Example 5. 
20 This confirms that genes homologous to the CO gene 

of Arabidopsis regulate flowering time in other plant 
species . 

Thus, included within the scope of the present 
invention are nucleic acid molecules which encode amino 
25 acid sequences which are homologues of CO of Arabidopsis 
thaliana. Homology may be at the nucleotide sequence 
and/or amino acid sequence level. Preferably, the 
nucleic acid sequence shares homology with the sequence 
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encoded by the nucleotide sequence of Figure I , 
preferably at least about 50%, or 60%, cr 70%, or 30% 
homology, most preferably at least 90% homology, from 
species ether than Arabidopsis chaliana and the encoded 
5 polypeptide shares a phenotype with the Arabidopsis 
thaliana CO gene, preferably the ability to influence 
timing of flowering. These may promote or delay 
flowering compared with Arabidopsis thaliana CO and 
mutants, derivatives or alleles may promote cr delay 
10 flowering compared with wild-type. 

CO gene homoiogues may also be identified from 
economically important monocotyledonous crop plants such 
as rice and maize . Although genes encoding the same 
protein in monocotyledonous and dicotyledonous plants 
15 show relatively little homology at the nucleotide level, 
amino acid sequences are conserved. In public sequence 
•databases we recently identified several Arabidopsis 
cDNA clone sequences that were obtained in random 
sequencing programmes and share homology with CO in 
20 regions of the protein that are known to be important 
for its activity. Similarly, among randomly sequenced 
rice cDNAs we identified one clone that shared 
relatively little homology to CO at the DNA level but 
showed high homology at the amino acid level. This 
25 clone, and another one that we have identified from 
maize, may be used to to identify the whole CO geae 
family from rice and other cereals. By sequencing each 
of these clones, studying their expression patterns and 
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15 



20 



examining Che effect of altering their expression, genes 
carrying out a similar function to CO in regulating 
flowering time are obtainable. Of course, mutants, 
derivatives and alleles of these sequences are included 
within the scope of the present invention in the same 
terms as discussed above for the Ara±>±dopsis thaliana CO 
gene . 

Nucleic acid according to the invention may 
comprise a nucleotide sequence encoding a polypeptide 
able to complement a mutant phenotype which is delayed 
flowering, the timing of flowering being substantially 
unaffected by vernalisation. The delayed flowering may 
be under long days. Also the present invention provides 
nucleic acid comprising a nucleotide sequence which is a 
mutant or derivative of a wild-type gene encoding a 
polypeptide with ability to influence the timing of 
flowering, the mutant or derivative phenotype being 
early or delayed flowering with the timing of flowering 
being substantially unaffected by vernalisation. These 
are distinguished from the LD gene reported by Lee et 



Vernalisation is low- temperature (usually just 
above 0°C) treatment of plant (seedlings) or seed for a 
period of usually a few weeks, probably about 3 0 days. 
It is a treatment required by some plant species before 
they will break bud or flower, simulating the effect of 
winter cold . 

Also according to the invention there is provided a 



al. 
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plant: cell having incorporated into its genome a 
sequence of nucleotides as provided by the present 
invention, under operative control of a regulatory 
sequence for control of expression. A further aspect of 
5 the present invention provides a method of making such a 
plant cell involving introduction of a vector comprising 
the sequence of nucleotides into a plant cell and 
causing or allowing recombination between the vector and 
the plant cell genome to introduce the sequence of 
10 nucleotides into the genome. 

Plants which comprise a plant cell according to the 
invention are also provided, along with any part or 
propagule thereof, seed, selfed or hybrid progeny and 
descendants . 

15 The invention further provides a method of 

influencing the flowering characteristics of a plant 
comprising expression of a heterologous CO gene sequence 
(or mutant, allele, derivative or homologue thereof, as 
discussed) within cells of the plant. The term 

20 "heterologous" indicates that the gene/sequence of 

nucleotides in question have been introduced into said 
cells of the plant using genetic engineering, ie by 
human intervention. The gene may be on an extra-genomic 
vector or incorporated, preferably stably, into the 

25 genome. The heterologous gene may replace an endogenous 
equivalent gene, ie one which normally performs the same 
or a similar function in control of flowering, or the 
inserted sequence may be additional to the endogenous 
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gene. An advantage of introduction of a heterologous 
gene is the ability to place expression of the gene 
under the control of a promoter of choice , in order to 
be able to influence gene expression, and therefore 
5 flowering, according to preference. Furthermore, 

mutants and derivatives of the wild-type gene, eg with 
higher or lower activity than wild- type, may be used in 
place of the endogenous gene . 

The principal flowering characteristic which may be 

10 altered using the present invention is the timing of 

flowering. Under-expression of the gene product of the 
CO gene leads to delayed flowering (as suggested by the 
co mutant phenotype) ; over-expression may lead to 
precocious flowering (as demonstrated with transgenic 

15 AraJDidopsis plants carrying extra copies of the CO gene 
and by expression from CaMV 3 5S promoter) . This degree 
of control is useful to ensure synchronous flowering of 
male and female parent lines in hybrid production, for 
example. Another use is to advance or retard the 

20 flowering in accordance with the dictates of the climate 
so as to extend or reduce the growing season. This may 
involve use of anti-sense or sense regulation. 

A second flowering characteristic that may be 
altered is the distribution of flowers on the shoot. In 

25 Arabidopsls, flowers develop on the sides but not at the 
apex of the shoot. This is determined by the location 
of expression of the LEAFY genes (Weigel et al . , 1992), 
and mutations such as terminal flower (Shannon and 
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Meeks -Wagner , 1991) that cause LEAFY to oe expressed in 
the apex of the shoot also lead to flowers developing at 
the apex. There is evidence that CO is required for 
full activity of LEAFY (Putteriil et ai . , 1995), and 
5 therefore by increasing or altering the pattern of CO 
expression the level and positions of LEAFY expression, 
and therefore of flower development, may also be 
altered. This is exemplified m Example 4. This may be 
employed advantageously in creating new varieties of 
10 horticultural species with altered arrangements of 
flowers . 

The nucleic acid according to the invention, such 
as a CO gene or homologue, may be placed under the 
control of an externally inducible gene promoter to 

15 place the timing of flowering under the control of the 
user. The use of an inducible promoter is described 
below. This is advantageous in that flower production, 
and subsequent events such as seed set, may be timed to 
meet market demands, for example, in cut flowers or 

20 decorative flowering pot plants. Delaying flowering in 
pot plants is advantageous to lengthen the period 
available for transport of the product from the producer 
to the point of sale and lengthening of the flowering 
period is an obvious advantage to the purchaser. 

25 The term "inducible" as applied to a promoter is 

well understood by those skilled in the art. In 
essence, expression under the control of an inducible 
promoter is "switched on" or increased m response to an 
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applied stimulus. The nature of the stimulus varies 
between promoters . Some inducible promoters cause 
little or undetectable levels of expression (or no 
expression) in the absence of the appropriate stimulus. 
5 Other inducible promoters cause detectable constitutive 
expression in the absence of the stimulus. Whatever the 
level of expression is in the absence of the stimulus, 
expression from any inducible promoter is increased in 
the presence of the correct stimulus. The preferable 

10 situation is where the level of expression increases 
upon application of the relevant stimulus by an amount 
effective to alter a phenotypic characteristic. Thus an 
inducible (or " switchable " ) promoter may be used which 
causes a basic level of expression in the absence of the 

15 stimulus which level is too low to bring about a desired 
phenotype (and may in fact be zero) . Upon application 
of the stimulus, expression is increased (or switched 
on) to a level which brings about the desired phenotype. 



20 Virus 35S (CaMV 35S) gene promoter that is expressed at 
a high level in virtually all plant tissues (Benfey et 
al, 1990a and 1990b) ; the maize glutathione-S- 
transferase isoform II (GST-II-27) gene promoter which 
is activated in response to application of exogenous 

25 safener (WO93/01294, ICI Ltd); the cauliflower meri 5 
promoter that is expressed in the vegetative apical 
meristem as well as several well localised positions in 
the plant body, eg inner phloem, flower primordia, 



Suitable promoters include the Cauliflower Mosaic 
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branching points m root and shoo: (Medford, 1992; 
Medford ec al, 1991} and the Arabidopsis thaliana LEAFY 
promoter that is expressed very early m flower 
development (Weigel et al, 1992) . 



cell, certain considerations must be taken into account, 
well known to those skilled in the art. The nucleic 
acid to be inserted should be assembled within a 
construct which contains effective regulatory elements 

10 which will drive transcription. There must be available 
a method of transporting the construct into the cell. 
Once the construct is within the cell membrane, 
integration into the endogenous chromosomal material 
either will or will not occur. Finally, as far as 

15 plants are concerned the target cell type must be such 
that ceils can be regenerated into whole plants. 

Plants transformed with a DNA segment containing 
the sequence may be produced by standard techniques for 
the genetic manipulation of plants. DNA can be 

20 transformed into plant cells using any suitable 

technology, such as a disarmed Ti-plasmid vector carried 
by Agrobacterium exploiting its natural gene transfer 
ability (EP-A-270355 , EP-A- 0116718 , NAR 12(22) 8711 - 
87215 1984), particle or micropro ject ile bombardment (US 

25 5100792, EP-A-444882, EP-A-434616) microinjection (WO 
92/09696, WO 94/00583, EP 331083, EP 175966), 
electroporation (EP 290395, WO 8706614) or other forms 
of direct DNA uptake (DE 4005152, WO 9C12096, US 



5 



When introducing a chosen gene construct into a 
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4684611 ) 



Agrooaccerium transrormacion is widely used 



by those skilled in Che arc Co transform dicotyledonous 
species . Although Agrobacterium has been reported co be 
able to transform foreign DNA into some monocoty ledonous 
5 species (WO 92/14828), micropro j ec t i le bombardment, 
eiectroporation and direct DNA uptake are preferred 
where Agrobacterium is inefficient or ineffective. 
Alternatively, a combination of different techniques may 
be employed to enhance the efficiency of the 
10 transformation process, eg bombardment with 

Agrobacterium coated micropart icles ( EP - A-4 8 6 2 34 ) or 
micropro] ectile bombardment to induce wounding followed 
by co-cultivation with Agrobacterium (EP-A-486233 ) 



15 technology will be determined by its efficiency to 
transform certain plant species as well as the 
experience and preference of the person practising the 
invention with a particular methodology of choice. It 
will be apparent to the skilled person that the 

20 particular choice of a transformation system to 

introduce nucleic acid into plant ceils is not essential 
to or a limitation of the invention. 

In the present invention, over-expression may be 
achieved by introduction of the nucleotide sequence in a 

25 sense orientation. Thus, the present invention provides 
a method of influencing a flowering characteristic of a 
plant, the method comprising causing or allowing 
expression of the polypeptide encoded by the nucleotide 



The particular choice of a transformation 
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sequence of nucleic acid according to the invention from 
that nucleic acid within ceils of the plant. (See 
Example 4 . ) 

Under-expression of the gene product polypeptide 
5 may be achieved using anti-sense technology or "sense 
regulation". The use of anti-sense genes or partial 
gene sequences to down- regulate gene expression is now 
well-established. DNA is placed under the contrci of a 
promoter such that transcription of the "anti-sense" 

10 strand of the DNA yields RNA which is complementary to 
normal mRNA transcribed from the "sense" strand of the 
target gene. For double - stranded DNA this is achieved 
by placing a coding sequence or a fragment thereof m a 
"reverse orientation" under the control of a promoter. 

15 The complementary anti-sense RNA sequence is thought 
then to bind with mRNA to form a duplex, inhibiting 
translation of the endogenous mRNA from the target gene 
into protein. Whether or not this is the actual mode of 
action is still uncertain. However, it is established 

20 fact that the technique works. See, for example, 

Rothstein et al , 1987; Smith et al , 1988; Zhang et al, 
1992 . 

Thus, the present invention also provides a method 
of influencing a flowering characteristic of a plant, 
25 the method comprising causing or allowing anti -sense 
transcription from nucleic acid according to the 
invention within cells of the plant. 

When additional copies of the target gene are 
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inserted in sense, chat is the same, orientation as the 
target gene, a range of phenotypes is produced which 
includes individuals where over-expression occurs and 
some where under-expression of protein from the target 
5 gene occurs. When the inserted gene is only part of the 
endogenous gene the number of under -expressing 
individuals in the transgenic population increases. The 
mechanism by which sense regulation occurs, particularly 
down-regulation, is not well -understood . However, this 

10 technique is also well -reported in scientific and patent 
literature and is used routinely for gene control. See, 
for example, van der Krol, 1990; Napoli et al, 1990; 
Zhang et al , 1992 . 

Thus, the present invention also provides a method 

15 of influencing a flowering characerist ic of a plant, the 
method comprising causing or allowing expression from 
nucleic acid according to the invention within cells of 
the plant. This may be used to suppress activity of a 
polypeptide with ability to influence a flowering 

20 characteristic. Here the activity of the polypeptide is 
preferably suppressed as a result of under-expression 
within the plant cells. 

As stated above, the expression pattern of the CO 
gene may be altered by fusing it to a foreign promoter. 

25 For example, International patent application WO93/01294 
of Imperial Chemical Industries Limited describes a 
chemically inducible gene promoter sequence isolated 
from a 27 kD subunit of the maize glutathione-S - 
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transferase, isoform II gene (GST-II-27) ,see Figure 2;. 
It has been found that when linked to an exogenous gene 
and introduced into a plant by transformation, the GST- 
II-27 promoter provides a means for the external 
5 regulation of the expression of that exogenous gene. 
The structural region of the CO gene is fused to the 
GST-II-27 promoter downstream of the translation start 
point shown in Figure 2 . 

The GST-II-27 gene promoter has been shown to be 

10 induced by certain chemical compounds which can be 

applied to growing plants. The promoter is functional 
in both monocotyledons and dicotyledons. It can 
therefore be used to control gene expression in a 
variety of genetically modified plants, including field 

15 crops such as canola, sunflower, tobacco, sugarbeet , 
cotton; cereals such as wheat, barley, rice, maize, 
sorghum; fruit such as tomatoes, mangoes, peaches, 
apples, pears, strawberries, bananas, and melons; and 
vegetables such as carrot, lettuce, cabbage and onion. 

20 The GST-II-27 promoter is also suitable for use in a 
variety of tissues, including roots, leaves, stems and 
reproductive tissues . 

Accordingly, the present invention provides in a 
further aspect a gene construct comprising an inducible 

25 promoter operatively linked to a nucleotide sequence 
provided by the present invention, such as the CO gene 
of Arabidopsis thaliana , a homologue from another plant 
species or any mutant, derivative or allele thereof. 
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2 0 

This enables control of expression of the gene. The 
invention also provides plants transformed with said 
gene construct and methods comprising introduction of 
such a construct into a plant cell and/or induction of 
5 expression of a construct within a plant cell, by 
application of a suitable stimulus, an effective 
exogenous inducer. The promoter may be the GST- II -27 
gene promoter or any other inducible plant promoter . 
Promotion of CO activity to cause early flowering 

10 Mutations that reduce CO activity cause late 

flowering under inductive long day conditions, 
indicating CO involvement in promoting flowering under 
long days. It is probably not required under non- 
inductive short days because co mutations have no effect 

15 on flowering time under these conditions. The CO 

transcript is present at very low abundance under long 
days and has only been detected by using PCR to amplify 
cDNA. The observation that some transgenic plants 
harbouring a T-DNA containing CO flowered slightly 

20 earlier than wild type under long days and considerably 
earlier than wild type under short days, suggests that, 
particularly under non-inductive short days, the level 
of the CO transcript is limiting on flowering time. 
This suggests that flowering could be manipulated by 

25 using foreign promoters to alter the expression of the 
gene : 

Causing early flowering under non-inductive 
conditions 
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Manipulation of CO transcript: levels under non- 
inductive conditions may lead to early, or regulated, 
flowering. Promoter fusions such as those disclosed 
herein enable expression of CO mRNA at a higher level 
5 than that found in wild- type plants under non-inductive 
conditions. Use of CaMV35S or meri 5 fusions leads to 
early flowering while use of GSTII fusions leads to 
regulated flowering . 

Causing early flowering under inductive conditions 

10 Wild-type Arabidopsis plants flower extremely 

quickly under inductive conditions and the CO gene is 
expressed prior to flowering, although at a low level. 
Nevertheless, some transgenic wild- type plants 
containing extra copies of CO have been shown to flower 

15 slightly earlier than wild-type plants. The level of 
the CO product may be increased by introduction of 
promoter, eg CaMV35S or men 5, fusions. Inducible 
promoters, such as GSTII, may be used to regulate 
flowering, eg by first creating a CO mutant of a 

20 particular species and then introducing an inducible 
promoter-CO fusion capable of complementation of the 
mutation in a regulated fashion. 

Inhibition of CO activity to cause late flowering 

co mutations cause late flowering of Arabidopsis . 
25 Transgenic approaches may be used to reduce CO activity 
and thereby delay or prevent flowering in a range of 
plant species. A variety of strategies may be employed. 
Expression of sense or anti-sense RNAs 
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In several cases the activity of endogenous plant 
genes has been reduced by the expression of homologous 
antisense RNA from a transgene, as discussed above. 
Similarly , the expression of sense transcripts from a 
5 transgene may reduce the activity of the corresponding 
endogenous copy of the gene, as discussed above. 
Expression of a CO antisense or sense RNA should reduce 
activity of the endogenous gene and cause late 
flowering . 

10 Expression of modified versions of the CO protein 

Transcription factors and other DNA binding 
proteins often have a modular structure in which amino 
acid sequences required for DNA binding, dimerisation or 
transcriptional activation are encoded by separate 

15 domains of the protein (Reviewed by Ptashne and Gann, 
1990) . This permits the construction of truncated or 
fusion proteins that display only one of the functions 
of the DNA binding protein. In the case of CO, 
modification of the gene in vitro and expression of 

20 modified versions of the protein may lead to dominant 

inhibition of the endogenous, intact protein and thereby 
delay flowering. This may be accomplished in various 
ways, including the following: 

Expression of a truncated CO protein encoding only 

2 5 the DNA binding region. 

The zinc -finger containing region of CO may be 
required and sufficient to permit binding to DNA. If a 
truncated or mutated protein that only encodes the DNA 
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binding region were expressed ac a higher level Chan the 
endogenous protein, then xcs: of the CO binding sites 
should be occupied by the mutated version thereby 
preventing binding cf the fully active endogenous 
5 protein. Binding of the mutant protein would have the 
effect of preventing CO action, because the mutated 
protein would not contain any other regions of CO that 
might be involved in biological processes such as 
transcriptional activation, transcriptional inhibition 

10 or protein-protein interaction. 

In vitro analysis of a murine transcription factor 
GF1 that contains zinc-fingers similar to those of CO, 
suggests that a truncated CO protein with the properties 
described above could be designed. Martin and Orkin 

15 (1990) demonstrated that a truncated version of GF1 

containing only the zinc fingers retained DNA binding 
activity, but was incapable of transcriptional 
activation. Similarly, the zinc-finger containing 
PANNIER protein of Drosophila melanogaster is required 

20 to repress activation of genes required for bristle 

formation. Mutations in a domain that does not contain 
the zinc fingers caused dominant super-repression of 
gene activity, probably because these proteins bind DNA 
but no not interact with other proteins in the way that 

25 the wild-type protein does (Ramain et al, 1993) . 

Expression of a mutant CO protein not encoding the 
DNA binding domain 

A second form of inhibitory molecule may be 
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designed if CO must dimerise, or form complexes with 
other proteins, to have its biological effect, and if 
these complexes can form without a requirement for CO 
being "bound to DNA . In this case expression of a CO 
5 protein that is mutated within the DNA-binding domain, 
but contains all of the other properties of the wild- 
type protein, would have an inhibitory effect. If the 
mutant protein were present at a higher concentration 
than the endogenous protein and CO normally forms 
10 dimers, then most of the endogenous protein would form 
dimers with the mutant protein and would not bind DNA. 
Similarly, if CO forms complexes with other proteins, 
then the mutant form of CO would participate in the 
majority of these complexes which would then not bind 



transcriptional activation domain of GAL 4 was able to 
20 reduce the expression of the CYC1 gene. CYC1 is not 

normally activated by GAL4 , so it was proposed that the 
GAL4 activating domain sequesters proteins required for 
CYC1 activation (Gill and Ptashne, 1988) . Similarly, 
mutations in the zinc finger region of the PANNIER 
2 5 protein of Drosophila melanogaster have a dominant 
phenotype, probably because the mutant proteins 
sequester proteins essential for PANNIER activity and 
reduce their availability to interact with wild- type 



1 5 DNA . 



Mutant forms of DNA-binding proteins with these 



properties have been reported previously. For exampl 
in yeast cells expression of a protein containing the 
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protein (Ramain, 1993} . 

Aspects and embodiments cf the present invention 
will now be illustrated, by way of example, with 
reference to the accompanying figures. Further aspects 
5 and embodiments will be apparent to those skilled in the 
art. All documents mentioned in this text are 
incorporated herein by reference. 
In the Figures: 

Figure 1 shows a nucleotide sequence according to 

10 one embodiment of the invention, being the sequence of 
the CO ORF obtained from Arabidopsis thaliana (SEQ ID 
NO. 1) , and the predicted amino acid sequence (SEQ ID 
NO. 2) . The nucleotide sequence is shown above the 
amino acid sequence. The region shown in bold is 

15 thought to encompass both zinc finger domains. 

Figure 2 shows the nucleotide sequence of the GST- 
11-27 gene promoter (SEQ ID NO. 3) . The fragment used 
to make the fusion was flanked by the Hindi II and Ndel 
sites that are shown in bold. 

20 Figure 3 shows the nucleotide sequence of the 

genomic DNA comprising the CO gene obtained from 
Arabidopsis thaliana , including the single intron, 
promoter sequences and sequences present after the 
translational termination codon (SEQ ID NO. 4). The 

25 genomic region shown starts 2674 bp upstream of the 
translational start site, and ends just after the 
polyadenylation site. The CO open reading frame is 
shown in bold, and is interrupted by the single intron. 
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Figure 4 shows the pJIT62 plasmid used as a source 
of the CaMV 35S promoter. The Kpnl- Hlndlll fragment, 
shown as a dark coloured thick line, was used as a 
source of the promoter. 

Figure 5 shows a nucleotide sequence according to a 
further embodiment of the invention, being a CO ORF 
obtained from Brassica napus (SEQ ID NO. 5) , and the 
predicted amino acid sequence (SEQ ID NO . 6) 

Figure 6 shows a nucleotide sequence according to a 
further embodiment of the invention, being a second CO 
ORF obtained from Brassica napus (SEQ ID NO. 7) , and the 
predicted amino acid sequence (SEQ ID NO. 8) 



EXAMPLE 1 - cloning and analysis of a CO gene 

15 Cosmid and RFLP markers. 

DNA of X CHS 2 was obtained from R . Feinbaum 
(Massachusetts General Hospital (MGH) , Boston) . Total 
DNA was used as radiolabeled probe to YAC library 
colony filters and plant genomic DNA blots. Cosmids 

20 g6833, 17085, 17861, 19027, 16431, 14534, g5962 and 
g4568 were obtained from Brian Hauge (MGH, Boston), 
cultured in the presence of 3 0 /ig/ml kanamycin, and 
maintained as glycerol stocks at - 70°C. Total cosmid 
DNA was used as radiolabelled probe to YAC library 

25 colony filters and plant genomic DNA blots. Cosmid 
PCIT1243 was provided by Elliot Meyerowitz (Caltech, 
Pasadena) , cultured in the presence of 100 /xg/ml 
streptomycin/spectinomycin and maintained as a glycerol 
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stock at - 70° C. pCIT30 vector sequences share 
homology to pYAC4 derived vectors, and therefore YAC 
library colony filters were hybridised with insert DNA 
extracted fron the cosmid. Total DNA of pCIT1243 was 
5 used as radiolabeiled probe to plant genomic DNA blots. 



YAC libraries . 

The EG, abi and S libraries were obtained from 

Chris Somerville (Michigan State University) . The EW 
10 library was obtained from Jeff Dangl {Max Delbruck 

Laboratory, Cologne) and the Yup library from Joe Ecker 
(University of Pennsylvania) . Master copies of the 

libraries were stored at -70°C (as described by Schmidt 

et al. Aust. J. Plant Physiol. 19: 341-351 (1992)). The 
15 working stocks were maintained on selective Kiwibrew 

agar at 4°C. Kiwibrew is a selective, complete minimal 

medium minus uracil, and containing 11% Casamino acids. 

Working stocks of the libraries were replated using a 

96 -prong replicator every 3 months. 

20 

Yeast colony filters. 

Hybond-N (Amersham) filters (8cm x 11cm) containing 
arrays of yeast colony DNA from 8-24 library plates were 
produced and processed (as described by Coulson et al . 
25 Nature 335:184-186 (1988) and modified (as described by 
Schmidt and Dean Genome Analysis, vol.4: 71-98 (1992)). 
Hybridisation and washing conditions were according to 
the manufacturer's instructions. Radiolabeiled probe 
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DNA was prepared by random -hexamer labelli 



ng. 



yeast chromosome preparation and fractionation by pulsed 
field gel electrophoresis (PFGE) . 
5 Five millilitres of Kiwibrew was inoculated with a 

single yeast colony and cultured at 30°C for 24 h. 
Yeast spheroplasts were generated by incubation with 
2.5mg/ml Novozyra (Novo Biolabs) for 1 h at room 
temperature. Then 1 M sorbitol was added to bring the 
10 final volume of spheroplasts to 50 ^1. Eighty 

microlitres of molten LMP agarose (l% InCert agarose, 
FMC) in 1 M sorbitol was added to the spheroplasts, the 
mixture was vortexed briefly and pipetted into plug 
moulds. Plugs were placed into 1.5ml Eppendorf tubes 
15 and then incubated in 1 ml of 1 mg/ml Proteinase K 

(Boehringer Mannheim) in 100 mMEDTA, pH 8 , l% Sarkosyl 
for 4 h at 50°C. The solution was replaced and the 
plugs incubated overnight. The plugs were washed three 
times for 30 min each with TE and twice for 30 min with 
20 0.5 x TVBE. PFGE was carried out using the Pulsaphor 
system (LKB) . One-third of a plug was loaded onto a 1% 
agarose gel and electrophoresed in 0.5 x TBE at 170 V,20 
s pulse time, for 36 h at 4°C. DNA markers were 
concatemers of \ DNA prepared as described by Bancroft 
25 and Wolk, Nucleic A Res. 16:7405-7418 (1988). DNA was 
visualised by staining with ethidium bromide. 

i-east genomic DNA for restriction enzyme digestion and 
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inverse polymerase chain rection (IPCR) . 

Yeast genomic DNA was prepared essentially as 
described by Heard et al . 11989) except that yeast 
spheroplasts were prepared as above. Finally, the DNA 
5 was extracted twice with phenol/chloroform, once with 
chloroform and ethanoi precipitated. The yield from a 
5ml culture was about 10/xg DNA. 



Isolation of YAC end fragments by IPCR. 
10 Yeast genomic DNA (100 ng) was digested with Alul, 

tfaelll, EcoRV or Hindi. The digestions were phenol- 
chloroform extracted once and then ethanoi precipitated. 
The DNA fragments were circularised by ligation in a 
volume of 100 (jl! over-night at 16°C in the presence of 2 
15 U ligase (BRL) . After incubation of the ligation 

mixture at 65°C for 10 min, IPCR was carried out on 10 
til ligation mixture using inverse primer pairs. The 
IPCR conditions and C and D primer pairs have been 
described by Schmidt et al . (1992). The JP series are 
20 from M. Hirst ( IMM Molecular Genetics Group, Oxford) . 

After digestion with the indicated enzymes, the 
following primer pairs were used: 
For left-end IPCR: 

Alul, EcoRV; D71 5 ' tcctgctcgct tcgctact t 3 ' 
25 and C78 5 ' gcgatgctgt cggaatggac3 ' 

Hael I I ; JP1 5 ' aagtactctcggtagccaag3 ' 
and JP5 5 ' gtgtggt cgcca tgat cgcg3 ' . 
For right -end IPCR: 
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Alul , Hindi; C69 5 ' ctgggaagtgaatggagacata3 ' 



and C7 0 5 ' aggagtcgcataagggagag3 ' 



Haelll; C69 and JP4 5 ' ttcaagctctacgccgga3 ' . 



10 



15 



20 



Aliquots of the IPCR reactions were checked by 
electrophoresis on a 1.5% agarose gel and the 1 /xl of 
the reaction was re - amplified by PCR using the 
conditions and F primer series recommended by I . Hwang 
( MGH , Boston) . Conditions for re -amplification were the 
same as for IPCR, except that 30 cycles (1 min, 94°C; 1 
min, 4 5°C; and 3 min, 72°C) were used. The F primers 
anneal very near the cloning site and so reduce the 
amount of vector sequence present in the PCR product . 
In addition they introduce a Fokl site very close to the 
destroyed cloning site of EW and S YACs . 

The primers used for re-amplification of left -end 
IPCR products were as follows: 
For EG, abi and S YACs : 

Alul, F2 5 ' acgtcggatgctcactatagggatc3 ' 

and C77 5 ' gtgataaactaccgcattaaagc3 ' ; 
Haelll, F2 and JP5 / ScoRV, F2 and 78. 
For EW and Yup YACs: Alul, 

F6 5 ' acgtcggatgactttaatttattcacta3 ' 

and C77 ; Haelll, F6 and JP5 ; EcoRV , FS and 

C78 . 

The following primers were used for re- 
amplification of the right-end IPCR products: 
For EG, abi and S YACs : Alul, 



F3 5 ' gacgtggatgctcactaaagggatc3 ' 
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and C71 5 ' agagcct tcaacccagt cag3 ' ; tfaelll, F3 



and JP4 ; Hindi, F3 and C70 . 



For 



EW and YUF YACs : Alul , 



F7 5 ' acgtcggatgccgatcccaagatta3 ' 



5 



and C77; tfaelll, F7 and JP4 ; 4HincII, F7 and 



C70 . 



The resulting PCR product was purified by cleaving 
with the enzyme originally used in the digestion 
together with Bairitll (EG and abi YACs) or EcoRI (Yup 
10 YACs) and separated on 1% LMP agarose gels. The YAC end 
probes were radiolabelled using random priming in molten 
agarose, and in appropriate cases digested with Fokl to 
remove vector sequences and then used as hybridisation 
probes . 



Isolation of YAC left-end probes by plasmid rescue. 

Plasmid rescue of YAC left -end fragments from EG, 
abi and EW YACs was carried out as described by Schmidt 
et al . (1992) . 



Isolation of plant genomic DNA . 

Plant genomic DNA was isolated from glasshouse 
grown plants essentially as described by Tai and 
Tanksley, Plant Mol . Biol. Rep. 8: 297-303 (1991), 
25 except that the tissue was ground in liquid nitrogen and 
the RNase step omitted. Large-scale (2.5-5 g leaves) 
and mmiprep (3-4 leaves) DNA was prepared using this 
method . 



15 



20 
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Gel blotting and hybridisation conditions . 

Gel transfer to Hybond-N, hybridisation and washing 
conditions were according to the manufacturer's 
instructions, except that DNA was fixed to the filters 
5 by UV Stratalinker treatment (1200 ^iJ x 100; Stratagene) 
and/or baked at 80°C for 2 h. Radiolabeled DNA was 
prepared by random hexamer labelling. 

RFLP analysis. 

10 Two to three micrograms of plant genomic DNA was 

prepared from the parental plants used in the crosses 
and cleaved in a 300 /xl volume with 1 of 17 restriction 
enzymes: Dral , Bell, Cfol, EcoRI , EcoRV , Hindi, BglHl , 
Rsal, BajriHI, HindHI , Sad, AIuI, Hmfl, Sau3A, TaqI and 

15 Mbol. The digested DNA was ethanol precipitated and 

separated on 0.7% agarose gels and blotted onto Hybond-N 
filters. Radiolabeled cosmid X or YAC end probe DNA 
was hybridised to the filters to identify RFLPs . 

20 selection of plants carrying recombination events in the 
vicinity of co. 

The first step in selecting recombinants was to 
create lines carrying the co mutation and closely linked 
markers. This was done twice for different flanking 
25 markers. In the first experiment a Landsberg erecta 
line carrying the co-2 allele (Koomneef et al . 1991) 
and tt4 was made. The tt4 mutation prevents the 
production of anthocyanm and has previously been 
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suggesced to be a lesion in Che gene encoding chaicone 
synthase, because this map to a similar location (Chang 
et al . 1988) . The double mutant was crossed to an 
individual of the Niederzenz ecotype and the resulting 
5 hybrid sel f - fertilised to produce an F 2 population. 
This population was then screened phenotypically for 
individuals in which recombination had occurred between 
co-2 and tt4 . In addition, F 2 plants homozygous for 
both mutations were used to locate marker RFLP g4568 

10 relative to co-2. 

The second experiment was performed by using two 
marked lines as parents. The first cf these contained 
chp7 in a Landsberg erecta background and was derived by 
Maarten Koomneef (Wageningen) from a cross between a 

15 line of undefined background (obtained from George 

Redei) to Landsberg erecta. The second parent contained 
markers lu and alJb2. This was selected by Maarten 
Koomneef from a cross of a plant of S96 background 
carrying the alJb2 mutation (M4-6-18; Relichova 1976) to 

20 a line containing co-1 and lu (obtained by Koomneef 
from J . Relichova, but originally from Cr . Redei) . 
The chp7, co-1 line was then crossed to the lu, alb2 
line and an F 2 population derived by self - fertilisation 
of the hybrid. This population was used to isolate the 

25 recombinants with crossovers between lu and co-1 and 
between co-1 and alb2. Both classes of recombinants 
were recognised phenotypically as lu homozygotes . These 
are only present if recombination occurs between lu and 
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alb2 , because alb2 is lethal when homozygous. 



Isolation of the CO (FG) locus: 

The CO gene is located on the upper arm of 
5 chromosome 5 and is 2cM proximal to tt4 . The average 
physical distance in lcM in Arabidopsis is approximately 
14 0 kb. The distance from CHS to CO might be expected 
therefore to be ca . 300 kb. 

We started by hybridising 4 RFLP markers that are 
10 closely linked (within ca . 2cM) to CHS to the EG and EW 
YAC libraries. This produced 18 hybridising YACs . 
These were run out on pulse field gels. Southern blotted 
and hybridised to the appropriate RFLP clone. This 
confirmed the colony hybridisation result and measured 
15 the size of the YACs; they ranged from 50 kb to 24 0 kb 
in size. The YACs were then digested with restriction 
enzymes, hybridised to RFLP marker DNA and the pattern 
of fragments compared to that of the marker. This 
allowed us to determine whether they contained all the 
2 0 fragments in the RFLP marker or only some of them and 
permitted us to deduce how the YACs lay in relation to 
each other. In most cases this arrangement was later 
confirmed by the isolation cf inverse polymerise chain 
reaction (PCR) generated fragments which are located at 
25 the ends of the Arabidopsis DNA inserted within the YAC, 
and hybridisation of these to the appropriate 
overlapping YACs . 

The short contigs around the RFLP markers were than 
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extended. We obtained two sets of overlapping cosmid 
clones from this area and used the appropriate ones 
against the YAC libraries. This identified two new 
YACs . End probes derived from most of the 20 YACs we 
5 had identified were then used to screen the libraries 
and new YACs extending the cloned region in both 
directions were identified. In all a detailed analysis 
of 67 YACs was necessary. It allowed us to assemble one 
contiguous segment of Arabidopsis DNA which includes 

10 RFLP markers 6833, CHS, pCIT1243 and 5962 and is 
approximately 170 0 kb long. 

The location of CO within the contig was determined 
by detailed RFLP analysis after the isolation of 
recombinants containing cross-overs very closely linked 

15 to CO. The recombinants were identified by using 

flanking phenotypic markers. First we made a Landsberg 
erecta chromosome marked with co and tt4. Then we 
crossed this to Niedersenz and screened 1200 F2 plants 
for recombinant chromosomes carrying cross -overs between 

20 co and tt4 . In this way we found twelve recombinants 
which were confirmed by scoring the phenotypes of their 
progeny. The rarity of these recombinants confirmed the 
extremely close linkage between tt4 and co. These 
recombinants were then used to locate CO on the contig. 

25 For example, some of them contain Landsberg DNA on the 
tt4 side of the cross-over and Niedersenz DNA on the co 
side. DNA isolated during our walk was positioned 
relative to CO by using small fragments as RFLP markers 
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and hybridising them to the DNA extracted from the 
recombinants. We used a similar approach on the 
proximal side by screening for recombinants between co 
and alb2. This work initially located CO between two 
5 YAC end probes which are approximately 300 kb apart. 

To locate CO more accurately within the 300 kb, 
more cross overs between co and the flanking phenotypic 
markers were screened for. Using a similar rationale as 
that described earlier, a total of 46 cross-overs 

1C between co and aIJb2 (an interval of 1.6cM proximal to 
CO), and 135 between co and 2u (an interval of 5.3cM 
distal to CO) were identified and analysed with 
appropriate RFLP markers derived from our contig. This 
located the gene to a very short region defined by two 

15 YAC end probes. These were used to screen a cosmid 

library provided to us by University of Minnesotta, and 
a short cosmid contig containing 3 cosmids that spanned 
the entire region was constructed. Analysis of these 
cosmids indicated that the detailed RFLP mapping had 

20 located CO to a region approximately 3 8 kb long. 

To position the gene within the cosmids, each of 
them was introduced into co mutants and the resulting 
plants examined to determine which of the cosmids 
corrected the co mutant phenotype . Roots of plants 

25 homozygous for co-2 and tt4 mutations were co-cultivated 
with Agrobacterium strains containing each cosmid 
(Olszewski and Ausubel, 1988; Valvekens et al 1989) and 
kanamycin resistant plants regenerated. The regenerants 
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(Tl generation) were self - fertilised and their progeny 
sown on medium containing kanamycin to confirm that they 
contained the T-DNA (Table 1) . 

A total of 5 independent transfomants containing 
5 cosmid A , 9 containing cosmid B and 13 containing cosmid 
C produced kanamycin resistant T2 progeny and were 
studied further. The flowering time of 20-40 plants 
from each of these T2 families was measured in the long 
day greenhouse. All of the progeny of transgenic plants 

10 made with cosmid A flowered as late as the co-2 mutants, 
suggesting that this cosmid did not contain the CO gene. 
However, several of the families derived from plants 
containing cosmids B and C included early flowering 
individuals. In total, 6 of the 9 families derived from 

15 plants harboring cosmid B and 12 of the 13 derived from 
those carrying cosmid C contained plants that flowered 
as early as wild-type. All of these early- flowering 
individuals produced light coloured seeds indicating 
that they carried the tt4 mutation present in the line 

20 used for the transformation, and therefore were not 

simply the result of the experiment being contaminated 
with seeds of wild-type plants (Experimental 
Procedures) . These results strongly suggest that the CO 
gene is contained in both cosmids B and C. 

25 Further experiments were carried out in the T3 

generation to confirm the complementation results. A 
total of five T2 early- flowering plants derived from 
cosmid B and six from cosmid C were self fertilised and 
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studied further in the T3 generation. Each of the T2 
plants chosen for this analysis was derived from a 
different transf ormant , was the earliest flowering plant 
in the T2 family and was a member of a family that had 
5 shown a ratio of 3 kanamycm resistant seedlings for 
each kanamycin sensitive, and therefore probably 
contained the transgene at only one locus (Table 1) 
All of the seedlings in these T3 families were resistant 
to kanamycin demonstrating that the parental T2 plants 
10 were homozygous for the T-DNA. This demonstrated that 
the earliest flowering T2 plants were homozygous for the 
CO transgene . 

Under the long-day conditions used the co-2 mutant 
plants flowered considerably later than the wild-type 
15 controls (Table 1) . The T3 plants flowered at least as 
early as wild- type under defined long-day conditions, 
and some individuals flowered earlier than wild- type 
(Table 1) . This analysis confirmed that cosmids B and C 
can correct the effect of the co-2 mutation on flowering 
20 time under long days, suggesting that both of these 

cosmids contained CO, and therefore that the gene was in 
the region of overlap between them. This region was 6.5 
kb long . 

We determined the sequence of the 6 . 5 kb that was 
25 shared by cosmids B and C . This contains only one gene 
that we can readily identify from the DNA sequence. The 
polymerase chain reaction was used to amplify this gene 
from three independently isolated co mutants, and 
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sequencing of these genes demonstrated that all three 
contained mutations. This, together with the 
complementation analysis, is conclusive evidence that 
this is the CO gene. The predicted amino acid sequence 
5 of CO shows no homology to previously reported genes. 
However, the amino terminus contains two regions that 
are predicted to form zinc fingers, suggesting that the 
protein product binds to DNA and is probably a 
transcription factor . 

10 

Unexpected difficulties in identifying CO within the 
300kb region defined bv REG1735 and LEW4A9 

1. Locating the gene by wore detailed RFLP mapping and 

15 complementation 

As mentioned, Putterill et al, Mol . Gen. Genet. 
239:145-157 (1993) described location of CO to within a 
region of 300kb. To locate CO more accurately by RFLP 
mapping, two materials were required: more recombinants 

20 carrying cross-overs within the 300kb region, and more 
RFLP markers to use as probes against these 
recombinants . 

Recombinants between lu and co or between co and 
alb2 were selected. A total of 68 cross-overs in the 

25 1.6 cM between lu and co were identified, and 128 in the 
5.3cM between co and alb2 . This is equivalent to 196 
cross-overs in 6.8cM, or an average of 29 cross-overs 
per cM . Among these recombinants, cross -overs within 
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the 300k±> were unexpectedly under- represented : 3 00kb is 
equivalent to around 1.5cM, so 43 (29 x 1.5) cross-overs 
would be expecetd in this region. Only 23 were found. 
The analysis of these cross-overs was also 
5 difficult because none of the YAC end probes that fell 
within the 300k±> could be used as RFLP probes. This was 
due to none of them detecting RFLPs between the parental 
lines used to make the recombinants. One RFLP marker 
(pCIT1243) was available within the region, and when 
10 this was used to analyse the recombinants it was found 
to be between REG17B5 and CO, thereby positioning the 
gene between pCIT1243 and LEW4A9 . However, a more 
accurate position of the gene could not be achieved by 
this method because of the lack of suitable probes. 
15 The distribution of cross-overs between pdtl243 

and LEW4A9 was asymmetric: there was one between 
PCIT1243 and CO and 19 between CO and LEW4A9 We 
gagged that the gene was likely to be close to 
PCIT1243. A pool of probes (LEG4C9, Labil9El, pCIT1243, 
20 LEG21H11 and REG4C9) from this region was therefore used, 
to screen a cosmid library to provide a series of cosmid 
clones extending from pCIT1243 towards LEW4A9 . Analysis 
of these clones with individual probes showed that the 
three cosmids A, B and C extended from pCIT1243 in the 
25 direction required. These were then used as RFLP 

markers and the gene demonstrated to be on the cosmids. 

The procedure was therefore more complex than that 
envisaged in the Putterill et al paper because of the 
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difficulty in making enough recombinants within the 
300k±> region, and in identifying suitable RFLP markers. 

2. Identifying the gene by complementation 
5 The three cosmids A , 3 and C were introduced into 

mutant plants, and it was shown that B and C could 
correct the effect of the mutation. The gene must 
therefore be on the DNA shared by B and C, but the 
method proposed in the Putterill paper for final 

10 identification of the CO gene failed. It had been 

assumed that one would be able to identify a transcript 
for CO by using the complementing DNA as a probe against 
Northern blots, or that one of the seven alleles would 
show a re -arrangement on Southern blots that would lead 

15 to the gene. In fact, we could not detect the CO 
transcript on Northern blots nor any re-arrangment 
indicative of where the gene might be. 

The failure of this approach led us to sequence the 
genomic DNA that complemented the mutation. Computer 

20 analysis of this DNA identified two open reading frames 
adjacent to each other and we guessed that these might 
represent the CO gene. We still had no evidence that 
thes ORFs were actively transcribed, as one would expect 
for a gene, because no transcript was detectable on 

25 Northern blots and no cDNA was detected in several cDNA 
libraries. We therefore used the polymerase chain 
reaction (PCR) to amplify a cDNA from RNA preparations. 
This showed that thse two ORFs did indeed represent one 
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active gene. Sequencing co alleles then confirmed that 
they contained single base changes, or in one case a 9bp 
deletion, that would not have been detected by the 
approaches proposed in the Putterill et al paper. 

5 

Gene Structure 

To determine the gene structure, a cDNA for the CO 
gene was identified using RT-PCR (Experimental 
Procedures) . The sequence of the cDNA contains an 1122 
10 bp ORF that is derived from both ORFs identified in the 
genomic sequence by removal of a 23 3 bp intron. 
Translation of this open reading frame is predicted to 
form a protein containing 373 amino acids with a 
molecular mass of 42 kd. The transcription start site 
IS was not determined, but an in frame translation 

termination codon is located three codons upstream of 
the ATG, indicating that the entire translated region 
was identified. The 3' end of the transcript was 
located by sequencing four fragments produced by 3 ' - 
20 RACE. They all contained the poly-A tail at different 
positions within 5 bases of each other. 

Available data bases were searched for proteins 
sharing homology with the predicted translation product 
of the CO gene. Searching the PROSITE directory 
2 5 detected no motifs within the CO protein. Moreover, a 
FASTA search comparing the CO protein sequence with 
those in GenBank detected no significant homologies. 
Direct comparison of the CO sequence with that of 
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L UMINIDEPENDENS , the other flowering time gene cloned 
from AraJDidopsis (Lee et al , 1994} , detected no 
homology. However, analysis of the protein sequence by 
eye identified a striking arrangement of cysteine 
5 residues that is present in two regions near the amino 
terminus of the CO protein. Each of these regions 
contains four cysteines in a C-X 2 -C-X 16 -C-X 2 -C 
arrangement, that is similar to the zinc-finger domains 
of G ATA - 1 transcription factors (C-X 2 -C-X 17 -C-X 2 -C) . 

10 Comparison of two 43 amino acid stretches that are 

directly adjacent to each other within the predicted CO 
protein sequence and each of which contains one of the 
proposed zinc fingers, indicates striking homology: 46% 
of the amino acids are identical and 86% are either 

15 identical or related. The conservation is most apparent 
on the carboxy side of each finger, which is again 
reminiscent of GATA1 transcription factors, in which 
this region is a basic domain required for DNA binding 
and is highly conserved (Trainor et al , 1990; Brendel 

20 and Karlin, 1989; Ramain et al, 1993). In the CO 

protein this region is also positively charged: there is 
a net positive charge of 6 in the region adjacent to the 
amino finger and of 3 in the one next to the carboxy 
f inger . 

25 Comparison of the CO protein sequence of the CO 

zinc fingers with 116 amino acids that contain the zinc 
fingers of hGATAl and are conserved between members of 
the GATA1 family (see Ramain et al, 1993) using the 
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FASTA programme of the Wisconsin package identified one 
81 amino acid region of homology chat spans both zinc 
fingers of CO and aligns the cysteines of the zinc 
fingers of hGATAl and those of CO. Between these 
5 regions of CO and hGATAl, twenty one percent of the 
amino acids are identical and 65% are similar or 
identical. Therefore although CO is not a member of -the 
GATA1 family it shows similarity to them m the region 
of the zinc fingers and represents a new class of zinc- 
10 finger containing protein. 

A further indication that these regions are 
important for CO activity is that the mutations in both 
the co-1 and co-2 alleles affect residues that are 
conserved between the proposed finger regions: co-2 
15 changes an arginine on the carboxy side of the N- 

terminal finger to a histidine, and the co-1 deletion 
removes three amino acids from the carboxy side of the 
C-terminal finger. 

20 Expression of CO mRNA in long and short day grown plants 
No CO cDNA clones were found by screening several 
Arahidopsis cDNA libraries and the mRNA was not detected 
on Northern blots of polyA mRNA extracted from seedlings 
at the 3-4 leaf stage (data not shown) . RT-PCR followed 

25 by Southern blotting and hybridisation to a CO specific 
probe was therefore used to detect the CO transcript. 
The RNA used in these experiments was isolated from 
seedlings at the 3-4 leaf stage, because this is just 




WO 96/14414 PCT/GB95A)2561 

before the floral bud is visible under long days and 
therefore seemed a likely time for the gene to be 
expressed . 

Six independent RNA preparations made from plants 
5 growing under long days all produced a hybridising 
fragment of the size expected for the CO cDNA. No 
difference in abundance of the CO transcript was 
detected between wild-type or co-2 mutant plants, 
suggesting that activity of the CO gene is not required 
10 to promote its own transcription. 



Flowering time under long days is influenced by CO gene 
dosage . 

Plants that are heterozygous for a wild-type allele 
15 and either co-1 or co-2 flower at a time intermediate 
between co homozygotes and Landsberg erecta under long 
days (Koorneef et ai, 1991; F. Robson, unpublished) . 
Sequencing of these mutant alleles demonstrated that 
they both contain in frame alterations to the amino acid 
20 sequence. This might suggest two models for the partial 
dominance of co. The mutant alleles might give rise to 
an altered product that interferes with floral 
induction, or the mutations might cause loss of function 
and the two- fold reduction in the level of the CO 
25 protein in a heterozygote lead to a delay in flowering 
time ( haplo- insufficiency ) . The haplo- insufficiency 
explanation is favoured by the results included herein. 
In the complementation experiments, transgenic 
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plants containing two copies cf cosmids B or C and 
homozygous for the co-2 allele often flowered at the 
same time as wild-type plants under long days. If the 
mutant allele encoded a product that interfered with the 
5 activity of the wild-type protein, then this would not 
be expected to occur . Moreover , the need to use RT-PCR 
to detect the CO transcript suggests that it is present 
at very low levels, which is consistent with -the 
possibility that further reductions in transcript lev~l 
10 causes late flowering . 



earlier flowering under long days. This was concluded 
from the observation that some of the transgenic lines 
carrying extra copies of the CO gene flowered slightly 
15 earlier than wild type plants (Tables 1 and 2) . This 
observation, together with the haplo-insuf f iciency 
phenotype discussed above, suggests that the level of 
expression of CO is a critical determinant of flowering 
time of Az-aJbidopsis under long days . 

20 

METHODS 

Growth conditions and measurement of flowering time 
Flowering time was measured under defined 
25 conditions by growing plants in Sanyo Gallenkamp 
Controlled Environment rooms at 2 0°C. Short days 
comprised a photoperiod of 10 hours lit with 400 Watt 
metal halide power star lamps supplemented with 100 watt 



Increases in the dosage of CO can lead to slightly 
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tungsten halide lamps. This provided a level of 
photosynthetically active radiation ( FAR) of 113.7 
ixmoles photons m' 2 s _1 and a red: far red light ration of 
2.41. A similar cabinet and lamps were used for the 
5 long day . The photoperiod was for 10 hours under the 
same conditions used for short days and extended for a 
further 8 hours using only the tungsten halide lamps. 
In this cabinet the combination of lamps used for the 10 
hour period provided a PAR of 92.9 ,umoles photons m~ 2 s" 

10 1 and a red : far red ratio of 1.49. The 8 hour extension 
produced PAR of 14.27 jimoies m~ 2 s" x and a red: far-red 
ratio of 0.66. 

The flowering t imes of large populations of plants 
were measured in the greenhouse . In the summer the 

15 plants were simply grown in sunlight . In winter 

supplementary light was provided so that the minumum 
daylength was 16 hours. 

To measure flowering time , seeds were placed at 4 °C 
on wet filter paper for 4 days to break dormancy and 

20 were then sown on soil. Germinating seedlings were 

usually covered with cling film or propagator lids for 
the first 1-2 weeks to prevent dehydration . Flowering 
time was measured by counting the number of leaves, 
excluding the cotyledons, in the rosette at the time the 

25 flower bud was visible. Leaf numbers are shown with the 
standard error at 95% confidence limits. The number of 
days from sowing to the appearance of the flower bud was 
also recorded, but is not shown. The close correlation 
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between leaf number and flowering time was previously 
demonstrated for Landsberg erecta and co alleles 
(Koorneef et al, 1991) . 
Plant material 
5 The standard wild-type genotype used was 

Arabidopsis thallana Landsberg erecta. The co-1 
mutation was isolated by Redei (1962) and is in an 
ERECT A background, that in our experiments showed no 
detectable RFLPs or sequence variation from Landsberg 

10 erecta. The co-2 allele was isolated in Landsberg 
erecta (Koornneef et al, 1991) . The details of the 
lines used for the accurate RFL? mapping of co were 
described previously (Putterill et al, 1993). 

In all cases described, lines carrying co-2 also 

15 carried tt4, although in order not to over-complicate 
the genotype descriptions in the text this is not 
mentioned. The tt4 mutation is within the chalcone 
synthase gene and prevents anthocyanin accumulation in 
the seed coat, but does not affect flowering time 

20 (Koornneef et al , 1983). The mutation is located on 

chromosome 5, approximately 3.3cM from co (Putterill et 
al, 1993). The use of a co-2 tt4 line was useful in 
confirming that individual plants did carry the co-2 
mutation . 



25 



RNA extractions 

RNA was extracted using a method which is a 
modified version of that described by Stiekma et al 
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(1988). Approximately 5 g cf tissue frozen in liquid 
nitrogen was ground in a coffee grinder and extracted 
with a mixture of 15 ml of phenol and 15 ml of 
extraction buffer (50 mM Tris pH8 , 1 mM EDTA, 1% SDS) . 
5 The mixture was shaken, centrifuged and 25 ml of the 
aqueous layer recovered. This was then shaken 
vigorously with a mixture of 0 . 7 ml 4M sodium chloride, 
10 ml phenol and 10 ml cf chloroform. The aqueous layer 
was recovered after centrif ugat ion and extracted with 25 

10 ml of chloroform. The RNA was then precipitated from 25 
ml of the aqueous layer by the addition of 2 ml cf 10 M 
LiCL, and the precipitate recovered by centrif ugation . 
The pellet was dissolved in 2 ml DEPC water and the RNA 
precipitated by the addition of 0 . 2 ml of 4M sodium 

15 chloride and 4 ml of ethanol . After centrif ugation the 
pellet was dissolved in 0.5 ml of DEPC water and the RNA 
concentration determined . 

DNA extractions 
20 AraJDidopszs DNA was performed by a CTAB extraction 

method described by Dean et al (1992) . 

Isolation of cDNA by RT-PCR 

Total RNA was isolated from whole seedlings at the 
25 2-3 leaf stage growing under long days in the 

greenhouse. For first strand cDNA synthesis, 10 of 
RNA in a volume of 10 jil was heated to 65°C for 3 
minutes, and then quickly cooled on ice. 10 jxl of 
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reaction mix was made containing 1 fil of RNAsin, 1 jil of 
standard dT 17 -adapter primer (1 Frohman et al , 

1988), 4jxl of Sx reverse transcriptase buffer (250mM 
TrisHCl pH8.3, 375mM KCl , 15mM MgCl 2 ), 2/xl DTT (lOOmM), 
5 l/il dNTP (20mM) , 1/xl reverse transcriptase (200 units, 
M-MLV Gibco) . This reaction mix was then added to the 
RNA creating a final volume of 20jxl. The mixture was 
incubated at 42°C for 2 hours and then diluted to 200 /xl 
with water. 

10 lOjxl of the diluted first strand synthesis reaction 

was added to 90^1 of PCR mix containing 4/xl 2 . 5mM dNTP, 
10/xl IOxPCR buffer (Boehnnger plus Mg) , Ifil of a 
100ng//xl solution of each of the primers, 73.7^x1 of 
water and 0.3^1 of 5 units//xl Taq polymerase (Boehringer 

15 or Cetus Amplitaq) . The primers used were C049 

( 5 ' GCTCCCACACCATCAAACTTACTAC 5' end located 3 8 bp 
upstream of translat ional start of CO) and CO50 
( 5 ' CTCCTCGGCTTCGATTTCTC 5' end located 57 bp upstream of 
translational termination codon of CO) . The reaction 

20 was performed at 94°C for 1 minute, 34 cycles of 55°C 
for 1 minute, 72°C for 2 minutes and then finally at 
72°C for 10 minutes . 

20 pi of the reaction was separated through an 
agarose gel, and the presence of a fragment of the 

25 expected size was demonstrated after staining with 

ethidium bromide. The DNA was transferred to a filter, 
and the fragment of interest was shown to hybridise to a 
short DNA fragment derived from the CO gene. The 
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remainder of the PCR reaction was loaded onto another 
gel, the amplified fragment was extracted, treated with 
T4 DNA polymerase and ligated to Bluescnpt vector 
(Stratagene) cleaved with EcoRV . The PCR reaction was 
5 done in duplicate, and two independently amplified cDNAs 
were sequenced to ensure that any PCR induced errors 
were detected. 



Isolation of cDNA fragments by 3' RACE 

10 First strand cDNA synthesis was performed using the 

same conditions, RNA preparation and dT 17 -adapter as 
described above for RT-PCR. The PCR was then performed 
using the standard adapter primer ( 5 ' gactcgagtcgacatcg ; 
Frohman et al, 1988) and the C04 9 primer described 

15 above. The PCR conditions were the same as described 
above, except that the amplification cycle was preceded 
by a 40 minute extension at 72°C. 20^1 of the reaction 
was separated through an agarose gel, and a smear of 
fragments between 550 bp and 1.6 kb in length was 

2 0 detected. The remainder of the reaction was loaded on a 
similar gel, the region predicted to contain fragments 
of 1-2 kb was excised, the DNA extracted and subjected 
to a second round of PCR using the adapter primer and 
another CO specific primer (C028, 

25 5 ' tgcagattctgcctacttgtgc , 5' end located 94 bp 

downstream of translat ional start site of CO) . When 
this PCR was monitored on an agarose gel a fragment 
around the expected size of 1.3 kb was detected. This 
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fragment was extracted from the gel, treated with T4 DNA 
polymerase and ligated to Bluescript DNA cleaved with 
EcoRV . Four amplified fragments recovered from two 
independent amplifications were sequenced entirely. All 
5 four were polyadenylated at slightly different 
positions, as described in the text. 



Detection of CO transcript by RT-PCR 

First strand synthesis was performed exactly as 
10 described above for the method used to isolate a cDNA 

clone, except that the RNA was isolated from plant grown 

in controlled environment cabinets at different stages. 

All samples were hairvested and analysed in duplicate. 

The primers used to amplify CO cDNA are described 
15 in the text and previously in Experimental Procedures. 

The primers used to amplify the cDNA of the gene used as 

a control were COl (5' TGATTCTGCCTACTTGTGCTC) and C02 

( 5 ' GCTTGGTTTGCCTCTTCATC ) 

2 0 DNA sequencing 

The Sanger method was used to sequence fragments of 
interest inserted in a Bluescript plasmid vector. 
Reactions were performed using a Sequenase kit (United 
States Biochemical Corporation) 



Jsolation of clones containing each of the seven co 
alleles 

DNA was extracted from plants homozygous for each 
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of the alleles. Approximately lng of genomic DNA was 
diluted to 10/ii with water and added to 90^1 of reaction 
mix, as described above except that primers C041 
( 5 ' ggtcccaacgaagaagtgc 5' end located 263 bp upstream of 
5 translat ional start codon of CO) and C042 

( 5 ' cagggaggcgtgaaagtgt 5' end located 3 34 bp downstream 
of translationai stop codon of CO) were used. The PCR 
conditions were: 94°C for 3 minutes, followed by 34 
cycles of 94°C for 1 minute, 55°C for 1 minute, 72°C for 

10 2 minutes and then finally 72°C for 10 minutes. In each 
case this produced a major fragment of the expected 
size, 1.95 kb . The PCR was carried out in duplicate for 
each allele. In each case the reactions were extracted 
with phenol and chloroform, ethanol precipated and 

15 treated with T4 DNA polymerase. The reactions were then 
separated through an agarose gel, the fragment purified 
and ligated to SK+Bluescript cleaved with EcoRV. 
Ligations were introduced into E.coli DH5 alpha and the 
recombinant plasmids screened by colony PCR for those 

20 carrying an insertion of the expected size. The DNA 
sequences of two independently amplified fragments 
derived from each allele were determined. 



Screening phage and cosmid libraries 
25 A lysate of the cosmid library (Olszewski and 

Ausubel, 1988) was used to infect E. coll DH5 alpha, and 
twenty thousand colonies were screened with the probes 
described in the text. Three cDNA libraries were 
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screened to try to identify a CO cDNA. The number of 
plaques screened were 5xl0 5 from the "aerial parts'* 
library (supplied by EC Arabidopsis Stock Center, MPI, 
Cologne) , 3x10 s plaques of a library made from plants 
5 growing in sterile beakers (made by Dr A . Bachmair and 
supplied by the EC Arabidopsis Stock Center) and lxlO 6 
plaques of the CD4-71-PRL2 library (supplied by the 
Arabidopsis Biological Resource Center at Ohio State 
University) . 

10 

Transformation of Arabidopsis 

The cosmids containing DNA from the vicinity of CO 
were mobilised into Agrojbacteriuin tumefaciens C58C1, and 
the T-DNA introduced into Arabidopsis plants as 
15 described by Valvekens et al, 1988. Roots of plants 
grown in vitro were isolated and grown on callus - 
inducing medium (Valvekens et al, 1988) for 2 days. The 
roots were then cut into short segments and co- 
cultivated with Agrobacterium tumef aciens carrying the 
20 plasmid of interest. The root explants were dried on 
blotting paper and placed onto callus- inducing medium 
for 2-3 days. The Agrobacterium were washed off, the 
roots dried and placed onto shoot inducing medium 
(Valvekens et al , 1988) containing vancomycin to kill 
2 5 the Agrobacterium and kanamycin to select for 

transformed plant cells. After approximately 6 weeks 
green calli on the roots start to produce shoots. These 
are removed and placed in petri dishes or magenta pots 
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containing germination medium (Valvekens et al , 1988) . 
These plants produce seeds in the magenta pots. These 
are then sown on germination medium containing kanamycm 
to identify transformed seedlings containing the 
5 transgene (Valvekens et al , 1988) . 



EXAMPLE 2 - Construction of promoter fusions to the CO 
open reading frame: 

10 A PvuII-EcoRV fragment containing the entire CO 

gene was inserted into the unique EcoRV site of the 
Bluescript™ plasmid. The CO gene fragment was inserted 
in the orientation such that the end defined by the 
EcoRV site was adjacent to the Hindlll site within the 

15 Bluescript™ polylinker. This plasmid was called pCOl . 
The PvuII -EcoRV fragment inserted in pCOl contains two 
Hindi I I sites both 5' of the point at which translation 
of the CO protein is initiated. Cleavage of pCOl with 
Hindlll produces a fragment that contains the entire CO 

20 open reading frame from 63bp upstream of the initiation 
of translation to the PvuII site which is downstream of 
the polyadenylation site, as well as all of the 
bluescript vector from the PvuII/EcoRV junction created 
by the ligation event to the Hindlll site within the 

25 polylinker. Ligation of a promoter containing fragment 
in the appropriate orientation to this fragment creates 
a fusion of the promoter to the CO open reading frame. 
For instance, a variety of promoters may be inserted at 
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this position, as discussed below. 



A GSTII promoter fusion to the CO open reading frame 

The GSTII promoter-containing fragment was derived 
5 from plasmid pGIE7 (supplied by Zeneca) as a Hindlll- 
Ndel fragment, whose sequence is shown in Figure 2. An 
oligonucleotide adapter (5' TACAAGCTTG) was inserted an 
the Ndel site to convert it into a Hindlll site. The 
resulting plasmid was then cleaved with Hindlll, and the 
10 promoter containing fragment ligated to the Hindlll 
fragment containing the CO open reading frame. A 
recombinant plasmid that contained the GSTII promoter in 
the orientation such that transcription would occur 
towards the CO open reading frame was identified by PstI 
15 digestion. The GST 1 1 - CO fusion was then moved into a 

binary vector described by Jones et al (1992) as a Clal- 
Xbal fragment . 

The binary vector may be introduced into an 
Agrobacterium tumefaciens strain and used to introduce 
20 the fusion into dicotyledonous species, or the fusion 
may be introduced into monocotyledonous species by a 
naked DNA transformation procedure. Protocols for 
transformation have been established for many species, 
as discussed earlier. 
25 The GSTII promoter may be used to induce expression 

of the CO gene by application of an exogenous inducer 
such as the herbicide safeners dichloramid and 
flurazole, as described in WO93/01294 (Imperial Chemical 
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Industries Limiced) 



A heat shock promoter fusion to the CO open reading 
frame 

5 An alternative inducible system makes use of the 

well characterised soybean heat shock promoter, 
Gmhspl7.3B, which is induced by expression in response 
to exposure to high temperatures in a variety of plant 
species (discussed by Balcells et al, 1994) . The 

10 promoter is available as a 440 bp Xhal-Xhol fragment 

(Balcells et al , 1994) which after treatment with T4 DNA 
polymerase may be inserted into pCOl cleaved with 
HindJJX, as described above for the GSTII fusion. The 
resulting fusion may then be introduced into the binary 

15 vector, Agrobac terium tumefaciens and transgenic plants, 
as described earlier. CO expression may be induced by 
exposing plants to temperatures of approximately 40°C. 



Fusion to the CO gene of a modified CaMV 35S promoter 
2 0 containing tetracycline resistance gene opera tors 

A modified CaMV 3 5S promoter which contains three 
operators from the bacterial tetracycline resistance 
gene has been developed as a chemically inducible 
system. In the presence of the tetracycline gene 
25 reporessor protein this promoter is inactive , but this 
repression is overcome by supplying plants with 
tetracycline (Gatz et al, 1992) . This is an 
alternative chemically inducible promoter which may be 
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fused to the CO open reading frame. The promoter is 
available as a Smal -Xbal fragment (Gatz et al , 1992 ) 
which after treatment with T4 DNA polymerase may be 
inserted into pCOl cleaved with Hindi I I as described 
5 earlier. After introduction of this fusion into plants 
also containing the repressor gene, CO expression may be 
induced by supplying the plants with tetracycline. 

A CaMV 35S promoter fusion to the CO open reading frame 
10 The CaMV 3 5S promoter was isolated from plasmid 

pJIT62 (physical map of which is shown in Figure 4) 
The KpnI-Hlndlll fragment containing the CaMV 35S 
promoter was fused to the CO open reading frame by 
ligation to plasmid pCOI cleaved with Hindi 1 1 and Kpnl . 
15 The single Kpnl site was then converted to a Clal site 
by insertion of an adapter oligonucleotide 
( 5 ' TATCGATAGTAC ) , and then a Clal-BamHI fragment 
containing the promoter fused to the CO ORF was inserted 
into a binary vector. The fusion may be introduced into 
20 transgenic plants either by the use of Agrobacterium 
tumefaciens or as naked DNA, as described earlier. 

Fusion of the meri 5 promoter to the CO open reading 
frame 

25 The meri 5 promoter is available as a 2 . 4 kb Bglli- 

StuI fragment (Medford et al, 1991). This may be 
treated with T4 DNA polymerase and inserted into the 
HindHI site of pCOl as described above. The fusion may 
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Chen oe introduced into transgenic plants, as described 
above . 

EXAMPLE 3 - Flowering time under short days of plants 
5 carrying extra copies of CO 

Under short day conditions wild type plants and co- 
2 homozygctes both flower at approximately the same time 
(Table 1) , suggesting that the CO product is not 
required for flowering under these conditions. However, 

10 under short days, several of the co-2 zt4 families 
carrying the T-DNAs derived from cosmids B and C 
flowered earlier than both the parental co-2 line and 
wild type (Table 1) . In particular, 2 lines (4 and 6) 
carrying cosmid C flowered much earlier than wild type. 

15 This suggested that in some families a transgenic copy 
of CO was expressed at a higher level than the original 
copy, or expressed ectopically, and that this led to 
earlier flowering under short days than that of wild 
type plants . 

20 Cosmid B was also introduced into wild-type 

Landsberg erecta plants and T2 plants homozygous for the 
transgene at a single locus were identified in the same 
way as described above (Table 1) . Of the 3 independent 
transf ormants analysed in the T3 generation, one 

25 flowered slightly earlier than wild-type plants under 
long days, and significantly earlier under short days 
(Table 1) . This again suggested that at least at some 
chromosomal locations, extra copies of the CO gene can 
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cause early flowering . 



EXAMPLE 4 - Influencing flowering characteristics using 
a CaMV 35S promoter /CO gene fusion 
5 A fusion of a CaMV 35S promoter to the CO open 

reading frame was introduced into co mutant Ajrabldopsls 
plants. First the Clal-BamHl fragment described in 
Example 2 was inserted into the Clal-BamHl sites of 
binary vector SLJ1711 (Jones et al . , 1992) . An 
10 Agrobacterium tumefaciens strain carrying this vector 
was then used for transformation of Arabidopsis root 
explants, followed by regeneration of transformed plants 
as described by Valvekens et al . (1988) 

The resulting transgenic plants flowered 
15 significantly earlier than wild-type under both 

inductive and non-inductive conditions. For example, 
under inductive long-day conditions, wild- type plants 
flowered after forming approximately 5 leaves, while the 
transgenic plants flowered with 3-4 leaves. Under non- 
20 inductive short days, wild-type plants flowered with 
approximately 20 leaves, while the transgenic plants 
formed 3-4 leaves. The use of promoter fusions to 
increase the abundance of the CO mRNA, or to alter the 
specificity of CO transcription, can therefore be used 
25 to lead to dramatically earlier flowering than that of 
wild-type plants . 

In addition, some of the transgenic plants carrying 
the fusion of the CaMV 3 5S promoter to the CO gene 
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formed a terminal flower at the end of the shoot. The 
shoot of wild-type plants shows indeterminate growth, 
growing and forming flowers on the sides of the shoot 
indef inately . However, terminal flower (tfl) mutants 
5 show determinate growth, terminating shoot development 
prematurely by forming a flower at the apex of the 
shoot. In wild-type plants, the TFL gene is thought to 
prevent the formation of flowers at the apex of the 
shoot, by preventing the expression of genes that 

10 promote flower development, such as LEAFY (LFY) , in the 
apical cells. This is supported by the observations 
that LFY is expressed in the shoot apex of tfl mutants 
but not wild type plants, and that fusions of the CaMV 
35S promoter to LFY cause transgenic plants to form a 

15 terminal flower (Weigel and Nilsen, 1995) . While not 
intending to be bound by any particular theory, the 
fusion of CO to the CaMV 35S promoter might therefore 
cause a terminal flower by activating genes such as LFY 
at the apex of the shoot . 

20 The two phenotypes caused by the CO fusion to the 

CaMV 35S promoter, early flowering and the formation of 
a terminal flower, may be separated by the use of other 
promoters. For example, terminal flower formation might 
be optimised by using a promoter, such as that of the 

25 meri 5 gene mentioned above, that is expressed mainly in 
the apical meristem, while early flowering without a 
terminal flower might result from expressing the gene 
from the promoters that are not well expressed in the 
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apical meristem, such as a heat-shock promoter. 

Example 5 - Cloning of a CO homologue from Brassies 
napuB 

5 Low stringency hybridizations (Sambrook et al . , 

1989) were used to screen a lambda genomic DNA library 
made from Brassica napus DNA. Positively hybridizing 
clones were analysed and classified by constructing maps 
of their restriction enzyme cleavage sites (using 
10 Hindlll, Xhol, EcoRV, Xbal , EcoRI and Ndel) CO 

homologues were distinguished from other members of the 
CO gene family because of the similarity of their 
restriction enzyme map with that of the Arabidopsis CO 
gene, and because a second gene that is located close to 

15 CO in the Arabidopsis genome was shown to be present at 
a similar position in the Brassica clones. Two CO 
homologues, corresponding to the genes present on 
Brassica napus linkage groups N10 and N19 (Sharpe et 
al., 1995), were then sub-cloned into plasmids and 

2 0 sequenced. The sequence of the gene from the N10 

linkage group is shown in Figure 5 and that from the N19 
linkage group is shown in Figure 6. The amino acid 
sequences of the proteins encoded by these genes are 
very similar to that of the Arabidopsis CO gene, 

25 particularly in the regions demonstrated by mutagenesis 
to be important for the functioning of the protein; 86 
amino acids across the zinc- finger region are 84* 
identical, and a 50 amino acid region at the carboxy 
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terminus of the protein, that is affected in two of the 
Ara±>idopsis mutants, is 88% identical. These two 
regions are the most conserved, with the intervening 187 
amino acids from the middle of the protein being 64% 
5 identical. 

This sequence analysis indicates that CO homologues 
can be isolated from plant species other than 
AraJbidopsis . In addition, restriction fragment length 
polymorphism mapping strongly suggests that CO 

10 homologues are important in regulating flowering time of 
other species. For example, in Brassica nigra a CO 
homologue closely co- segregates with a major 
quantitative trait locus for flowering time (U. 
Lagercrantz et al, in press) , and in Brassica napus CO 

15 homologues mapping to linkage groups N2 and N12 co- 
segregate with allelic variation for flowering time. 
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TABLE 1 - Flowering time and segregation of kanamycin 
resistance in T2 and T3 generations of co-2 carrying the 
T-DNA of cosmid B or C plants 



5 


Trans - 
genie 
co tt4 
line 
scored 


Ratio of 
Km 

resistant 

seed] i n rr q 
in T2 1 


Average LN 
at 

flowering 

UL 1 J 

individual 
under LDs 2 


AVPTAfTP T ."M 

**■ v W -1. d y J t |M 

at 

flowering 
or T3 

individual 
under SDs 2 


Ratio of 
Km 

resistant 
seedlings 
in T3 


10 


cosmid B 
line 1 


3 : 1 


4 . 6 + /-0 . 4 


14 . 0 + /-2 . 5 


1 : 0 




cosmid B 
line 2 


3.7:1 


4 . 2 + /-0 . 3 


18 . 5 + /-1 . 1 


1 : 0 


15 


cosmid B 
line 3 


2.9:1 


4 . 6+/-0 . 8 


13 . S + /-4 . 1 


1 : 0 




cosmid B 
line 4 


2.4:1 


4 . 6 + /-0 . 8 


16.4+/-2.2 


1 : 0 




cosmid B 
line 5 


3.0:1 


5 . 1 + /-0 . 5 


18 . 5 + /-1 . 1 


1 : 0 


20 


cosmid C 
line 1 


2.9:1 


4 . 6 + /-0 . 6 


20 . 6+/-3 . 8 


1 : 0 




cosmid C 
line 2 


3.4:1 


3 . 9 + /- 0 . 4 


11.7+/-3.2 


1 : 0 


25 


cosmid C 
line 3 


3.3:1 


4 . 0 + /-0 . 4 


20 . 4 + /-1 . 2 


1 : 0 




cosmid C 
line 4 


4.9:1 


3 .7+/-0.3 


3 7.6+/-5.3 


1 : 0 




cosmid C 
line 5 


3 :1 


4 . 9 + /-0 . 6 


17.7+/-2.1 


1 : 0 


30 


cosmid C 
line 6 


3.8:1 


3 .5+/-0.5 


6 . 6 + /-1 . 4 


1 : 0 




Landsberg 
erecta 




5 . 1 + /-0 . 8 


18.9+/-2.4 




35 


co-2 




12 .4 + /-1 . 0 


18 .1+/-3 .4 





Flowering time was measured by counting the number of 
leaves present at the time that the flower bud appeared 
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in the centre cf the rosette (Koornneef et al, 1991; 
Experimental Procedures) . 

1 Over 80 plants were tested in each family, except 
for cosmid B line 3 in which 35 plants were used. 
5 2 1C plants from each family were tested 

3 The larae standard error in this population was 
due to 2 plants that flowered with 18 leaves, while the 
other 8 has a leaf number of 5.1+/-1 at flowering. 
Southern analysis of this line using a T-DNA fragment as 

10 probe identified 6 hybridising fragments. The variation 
in flowering time could therefore be due to the 
segregation of one T-ENA copy that is required for early 
flowering, or to the occurrence of co-suppression 
repressing activity of the transgenes in some 

15 individuals. 



20 



25 
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TABLE 2 - Flowering time of transgenic wild- type plants 
carrying extra copies of the CO gene 



5 


Lands - 

berg 

erecta 

trans - 

genie 

line 


Km 
in 
T2 1 


Average LN at 
flowering of 
T3 

individuals 
under LDs 2 


Average LN at 
flowering of 
T3 

indi vi Hna 1 c 

J- *. ±\JL .1. V A. \U. CL -L. O 

under SDs 2 


Ratio of 
kanamycin 
resistance 
in i j 


10 


cosmid B 
line 1 


3 . 4 

: 1 


4 . 4 + /-1 . 0 


18 . 1 + /-2 - 1 


1 : 0 




cosmid B 
line 2 


5 . 9 

: 1 


3 . 2+/-0 . 6 


10 . 1+/-2 . 2 


1:0 


15 


cosmid B 
line 3 


2 . 8 

: 1 


4 . 0+/-0 . 5 


19 . 6+/-2 .2 


1 : 0 




Lands - 

berg 

erecta 




5 . 1 + /-0 . 8 


18 . 9 + /-2 .4 






co -2 






12 . 4+/-1 . 0 


18.1+/-3 .7 




20 


L Over 80 


plants 


were tested in 


each family . 





10 plants from each family were tested. 
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CLAIMS 



1. A nucleic acid isolate comprising a nucleotide 
sequence encoding a polypeptide with CO function. 

2. Nucleic acid according to claim 1 wherein said 
5 nucleotide sequence is that of the CO gene of 

Arabidopsis thaliana or a CO homologue from another 
plant species, or a mutant, derivative or allele of the 
gene or homologue . 



3. Nucleic acid according to claim 2 wherein said CO 
nucleotide sequence is shown in Figure 1. 

4. Nucleic acid according to claim 2 wherein said CO 
homologue is from Brassica . 



5. Nucleic acid according to claim 4 wherein said CO 
homologue nucleotide sequence is shown in Figure 5 or 
15. Figure 6 . 



6. Nucleic acid according to claim 1 or claim 2 
wherein expression of said nucleotide sequence delays 
flowering in a transgenic plant. 

7. Nucleic acid according to claim 2 wherein the 
20. polypeptide encoded by said nucleotide sequence is a 

mutant or derivative of wild-type CO or a wild-type CO 
homologue and expression of said nucleotide sequence 
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delays flowering in a plant . 



8. Nucleic acid according to claim 2 wherein the 
polypeptide encoded by said nucleotide sequence is a 
mutant or derivative of wild-type CO or a wild-type CO 
5 homologue and expression of said nucleotide sequence 
promotes flowering in a plant. 



9 . Nucleic acid according to claim 1 or claim 2 

wherein expression of said nucleotide sequence promotes 
flowering in a transgenic plant. 



10 10. A nucleic acid isolate comprising a nucleotide 
sequence encoding a polypeptide able to complement a 
mutant phenotype in a plant, which phenotype is delayed 
flowering, the timing of flowering being substantially 
unaffected by vernalisation. 



15 11. A nucleic acid isolate comprising a nucleotide 

sequence which is a mutant or derivative of a wild-type 
gene encoding a polypeptide with ability to influence 
the timing of flowering, the mutant or derivative 
phenotype being delayed or early flowering with the 

20 timing of flowering being substantially unaffected by 
vernalisation . 



12. Nucleic acid according to any of claims 1 to 11 
further comprising a regulatory sequence for expression 
of said polypeptide. 
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13. Nucleic acid according to claim 12 comprising an 
inducible promoter. 

14. Nucleic acid according to claim 13 wherein the 
promoter is derived from a maize gene for a 27 kD sub- 

5 unit of glutathione-S-transf erase , isoform II. 

15. a nucleic acid isolate comprising a nucleotide 
sequence complementary to a coding sequence of any of 
claims l to 11 or a fragment of a said coding sequence. 



10 



16. Nucleic acid which is DNA according to any one of 
claims 1 to 12 or claim 15 wherein said nucleotide 
sequence or a fragment thereof is under control of a 
regulatory sequence for anti -sense transcription of 
said nucleotide sequence or a fragment thereof. 



17. Nucleic acid according to claim 16 comprising an 
15. inducible promoter. 

18. Nucleic acid according to claim 17 wherein the 
promoter is derived from a maize gene for a 27 kD sub- 
unit of glutathione-S-transf erase, isoform II. 

19. A nucleic acid vector suitable for transformation 
22. of a plant cell and comprising nucleic acid according 

to any one of the preceding claims . 



20 . 



A plant cell comprising nucleic acid according to 
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any preceding claim. 

21. A plant cell according to claim 20 having 
heterologous said nucleic acid within its genome. 

22. A plant cell according to claim 21 having more 
£ than one said nucleotide sequence per hapioid genome. 

23 . A plant comprising plant cell according to any 
one cf claims 20 to 22. 

24 . Self ed or hybrid progeny or a descendant of a 
plant according to claim 23, or any part or propagule 

10 of such a plant, progeny or descendant, such as seed. 

25. A method of influencing a flowering 
characteristic of a plant, the method comprising 
causing or allowing expression of the polypeptide 
encoded by nucleic acid according to any one of claims 
1 to 14 from that nucleic acid within cells of the 
plant . 

26. A method of influencing a flowering 
characteristic of a plant, the method comprising 
causing or allowing transcription from nucleic acid 

20 according to any one of claims 1 to 14 within cells of 
the plant. 



27 



A method of influencing a flowering 
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characteristic of a plant, the method comprising 
causing or allowing anti-sense transcription from 
nucleic acid according to any one of claims 15 to 18 
within cells of the plant. 

5 28. A method of identifying and cloning CO homologues 
from plant species other than Arabidopsis thaliana 
which method employs a nucleotide sequence derived from 
that shown in Figure 1 . 

29. Nucleic acid encoding a CO homologue obtained by 
10 the method of claim 28. 

30. Nucleic acid according to claim 29 which 
comprises a nucleotide sequence shown in Figure 5 or 
Figure 6 . 

31. A method of identifying and cloning CO homologues 
IS from plant species other than Ajrabldopsis thaliana 

which method employs a nucleotide sequence derived from 
a sequence shown in Figure 5 or Figure 6 . 



32. Nucleic acid encoding a CO homologue obtained by 
the method of claim 31. 
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Figaira 1 - 

1 ATGTTGAAACAAGAGAGTAACGACATAGGTAGTGGAGAGAACAACAGGGCACGACCCTGT 
MLKQESNDIGSGENNRARPC 

61 GACACATGCCGGTCAAACGCCTGCACCGTGTATTGCCATGCAGATTCTGC^ 

DTCRS N A C TVYCHADSAYLC 

121 ATGAGCTGTGATGCTCAAGTTCACTCTGCCAATCGCGT^ 

MSCDAQVH SANRVASRHKRV 

181 CGGGTCTGCGAGTCATGTGAGCGTGCTCCGGCTGCTTT^ 

R V C E S C E R A P-AAFLCEADDA 

2 41 TCTCTATGCACAGCCTGTG ATTCAGAGGTTCATTCTC 

SLCTAC D S EVHSANPLARRH 

3 01 CAGCG AGTTCCAATTCT ACCAATTTCTGG A^ 

QRVPT L P I SGNSFSSMTTTH 

3 61 CACCAAAGCGAGAAAACAATGACCGATCCAGAGAAGAGACTGGTGGTGGATCAAGAGGAA 

HQSEKTMTD PEKRLVVDQEE 

421 GGTGAAGAAGGTGAT AAGGATGCCAAGGAGGTTGCTTCGTGGCTGTTC 

GEEGD KDA K EVASWLF PNSD 

481 AAAAATAACAATAACC AAAAC AATGGGTTATTGTTTAGTGATGAGTATCTAAACCTTGTG 
KNNNNQNNG LLFSDEY LNLV 

5 41 GATTACAACTCG AGTATGG ACT ACAAATTC ACAGGTG AATAC AGTCAACACCAACAAAAC 

DYNSS MD Y K FTGEYSQHQQN 

601 TGCAGCGTACCACAGACGAGCTACGGGGGAGATAGAGTTGTTCCGCTTAAACTTGAAGAA 
CSVPQTS YGGDRVVPLKLEE 

6 61 TCAAGGGGCCACC AGTGCC AT AACCAAC AG AATTTTCAGTTC AATATCAAATATGGCTCC 

SRGHQCH NQQNFQFNI KYGS 

7 21 TCAGGGACTCACTACAACGACAATGGTTCCATTAACCATAACGCATACATTTCATCCATG 

SGTHY ND NG S I N H N A Y ISSM 

781 G AAACTGGTGTTGTG C CGG AGTC AAC AGC ATGTGTC ACAAC AGCTTC ACACCC AAG AACG 

ETGVV P E S TACVTTAS HPRT 

841 CCCAAAGGGACAGTAGAGCAACAACCTGACCCTGCAAGCCAGATGATAACAGTAACACAA 
PKGTV EQQ P DPASQMITVTQ 

901 CTCAGTCCAATGG AC AG AG AAGCC AGGGTCCTG AG AT ACAGAG AG AAG AGG AAG ACAAGG 

LSPMDREARVLRYREKRKTR 

9 61 AAATTTGAGAAGACAATAAGGTATGCTTCGAGGAAGGCATATGCAGAGATAAGACCGCGG 
KFEKT IR YASRKAYAE I R P R 

1021 GTCAATGGCCGGTTCGCAAAGAGAGAAATCGAAGCCGAGGAGCAAGGGTTCAACACX3ATG 

VNGRFAKREIEAEEQGFNTM 

1081 CTAATGTAC AAC AC AGGATATGGG ATTGTTCCTTCATTCTGATA 
LMYNTGYGIVPSF* 
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Fignro 2 . 

I AAGCTTGGGC GTAGGTGTTG TCTATCGGCG AAAAC ACGCG CGGTACGCCA 

51 AGAACAGCGC GGCCATCTCC ATCCCAGGCA CGGTGCGCCC GCTTTTTCGC 

101 CGTCTCGCTG AGTCACGGCG GGCGTCCAGC AGGTAGTTGA GCGCCTTCCG 

151 CGGCACGAAT CGCTGCGTGC GQCCCGG ATC TGGTCGAGTT GGTAGTCAGC 

201 GTCGGTGTCG AATGCCGGGA CGTCGACCAG GAAGAAGTTG CCGTCGCTGG 

251 GGTGGGG ACG GAAGGCGTCA GGATTGTCGC AAGGGCAGAG CCCAGCCTGC 

3 01 GGGCGGGGCT ACCTCGTCGA CGCCTCGGCA CGGCGGCGGC AAAGCTGCTG 

3 51 CGGGACGTGC CCGCCTGGGC CGCCTTCTCG GTGAAGTGGT CCTCGAAGGG 

401 GACGAGCTCG CTGGGGTCAA ACCACCCCAT AGCTCGAGTC ACCGAAGAAG 

4S1 GCGACGAGGA CGAGCCCGTC GCGGTGGCCG CGGTGTACCT CCTCGTCGTC 

501 GGTGAGGCTG ACGCTGTAGA TATGGCCAGG CCACCACGGA TGGGACTTCA 

551 CCTTGGCCCA GACCATGTCG CCGAACCGGG GGCCGCCGTT CGCCCATGCG 

601 ATGCCGCGTC CGGCAGCAGG AACCATGGCG CCTCCAGCGG CGGGGTCGGA 

651 CATC CTGTGG AGGGGAACCG AAAAC CT AG A TTTGGATGCA GGTTCGATTG 

701 GTCTGGGCTT GGGTTTGGGT TCCGGAGGAG GGTGGCCTGG GATCGGTGGA 

751 AGGAGGGACA TTGTTGGTAA TTTTTATTAT TTTATAATAT GGAGAAATTC 

801 GAGAGACTGA ACGATGGTGA TGTTTATTTG AGGACTATGT AGTATAAAGT 

851 GTAAAATAGT ATTTTATCAA GTTTATATTC ACGTTTTTGC TGAAGATAGT 

901 ATAATAGTGG AGTTGTTTTT GGCGGCT ACA TAATCTTAGG CTATCTTCTC 

951 GGTCGCTCTC AT ATC AT ATC TACTATCACA TTCTCTATTT TAAATTTCAC 

1001 TTTGTGTAAT CT AC ACT ATA AAATAGTGTT TTACACGGTA TGTTGTACAC 

1051 AGCCTTATCG TGGCGCGACG GAGTTGGATA GAGATOGTGA ACAGCTGGAT 

1101 AGATATGATT TATAGGCGAT TGGGTAGATG TGATTTGATA GGTGGTTATG 

1151 TAGGAGCGAT TT AGTG AG AC ATTGTAAATA ATTAGGTTGA TGTGATCCGA 

1201 GGATGGCTAG GTAGATATGA TTTTAATGGA TGGTTTGGTG GACTAAGTTA 

12 SI TGTCGACATT ATAATATGTT TTAAATTTCT AAGAAATTGT TTGTGTTAAA 

13 01 TTGTATCCCA CAT AG ATT AT TTAGCCATCT CAAAGAGAGG TTTGGGTTGT 
1351 TTACACAAAT AAAATATTCG TTTGCTTCTA CAATTTATAT GTTTTTTATT 
1401 TACATGAAAA CTATATTTTT TATTCATCTA CTCACCCAGC ACAGAAATTC 

14 51 TGGTTGAGTA GATGAAAAAA AACTACAACA AACTCTTCCT GAAAGTGTCG 
1501 GTGTGAAGCC G AG AAATCCT TTTCATTTCG GTGACGGAGC CCCTTGCTGG 
1551 CTGCTGCTCA GTGCACTCCG TTCGCCTGCC TGCCACTACA AGCGACGGCC 
1601 GACGACTCGC AAGTATCGGT AGGCATTTTA AAACTGAAAA CCAAATCTAA 
1GS1 ACCCGAATAG ACCAAATPGT IXGGTTTATTC GGGTTTTTGG CTTCGGATTC 



SUBSTITUTE SHEET (RULE 26) 



WO 96/14414 PCT/GB95/02561 



1701 


GGTTTCTAAA 


TATGCTATAT 


TTTAGGGTAT 


AGGTTCGGGT 


TCAGTTTCTA 


1751 


ACCTTTAAAA 


CCTGAATAG A 


CGAATAACCC 


G AAAT AT AAA 


AAATCTCTTA 


1801 


ATATGTGATG 


ATATTATTAT 


ATGATTTATG 


AACTTATTAA 


CCGAAAATAA 


1851 


TGATACCATC 


CTAACGATAG 


TATATATATC 


TATGTATGCT 


ATTTTTATAG 


1901 


TCACTTGTTG 


TAATAATAGT 


ACTTCCAATT 


AATTAATCAG 


TGTATATATT 


1951 


TTAACAAAAG 


AT ACT AGO CT 


CTCTACTATT 


TGAGTATATT 


CGGTGCACCG 


2001 


AATAGACCGA 


ACCGAAATTG 


TAAGTCTATT 


CAGGTTCGGT 


TCCTAAAATT 


2051 


ATTTTAAAAA 


TTTTGGTTCT 


CATATTTCAG 


AATCCGAAAT 


TTCATAAATC 


2101 


CAAATAGACC 


GAACCAAATT 


ACG CTAATAG 


ACCGAATAAC 


TAGCGTACTC 


2151 


GCAAGTCGCA 


CCCCACTAGC 


CTGCTGCGTG 


CGTAAGCGAG 


GACGTCACGC 


2201 


GTTCTCCCTC 


CCGTCGACCA 


AATACACTTG 


GTCTTCTAGC 


ACCTTCl'TCC 


2251 


TCTCCAAGAC 


TCCAATCCCC 


CAACCACCAG 


AACCAGCGCC 


AGCTCTAACG 


2301 


TCACCTCTGA 


TTTCTCTCTC 


CTCTCTATTG 


CTAGCTGCTT 


TATTATAAGT 


2351 


AGCAGCTGCA 


GCAGGCAGGA 


GCTGCACACA 


CCCATCCAAT 


TCCAGCTGCT 


2401 


GATCTTGATC 


CTGCACCCCG 


AGCCGTACAC 


AAGAGCTAGT 


CGGTAGAACT 


2451 


TGCAGGAGCG 


GAGCAGAACT 


AAGTGCAGAG 


AACAGGACAT 


ATG 
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Figruro 3 . 

1 GAATTCATGT ACCAAATCAA TACTTTTTAG CCATAAATGA GTCAGTTTTA 

51 GTATCCACAT GAATTTACCT ACCAGAGTGT TGTAAATTAT GTTCTTTTGG 

101 GGCCACTTAC ATGGATCTCA TTCATTCACT GCAGCGAGTT CTCAGACCAC 

151 CAGAAAATTT ATTCAGTGAT CT G TTTTGAT CATGCAACAT AAACTTATAA 

201 GCCACACAAG CAAAACAAAG ATATCCCATG TTGCATATAA TACGAGCTAG 

251 CATATCATAA AGAAGGAAAC TTGAAGTAGC AAAGTTTCTA CTAAATTTCT 

3 01 TGTCAGGAAT TTTTAAAATG CAATGACAAC CACTTGGAGC ACTATGAGTT 

3 51 TCAGAGCCAA TAG AATGTT A CTATTTGGTG TGGATTCGAG CTAGCACGTG 

401 AAAGTGCATA AAAGTGATTA CCTTTTGCCA AAGGTCACTG CACTTTTCCT 

451 CAGATAGTTT CTCACAGCCA TGGAAAGTGG AGAATCCGCA TAAACGTACA 

501 ATTACAAGCT TTATATGGTC CCTCGACTCT TATTCTCTTC TCAGTCTTTG 

551 CAACTAAATA GGGTTTTCGT TAATCTGAAA GAAGCAAAGT ATTCGAAACC 

601 ACGGAAACCT GATAAAGAAT GAAAACAAAT AAGCAATAGT GTTTTCTTGA 

651 AAATCTCGAT GCAACTTTGA GGATATTGTT ACATATGATC TATTACTCGT 

701 AACAGTTATC CGAAGGCCTA CACATGTGAG AGAAGTTCCA AACCGCTACA 

751 ACAATAAACT TAATTAGAGA CTGTCAACGA GCAATAATAA G CAAAACT AC 

801 TTTTTTCTTG AGCTACAAGT GAAAAGGCCA ATACACAATT TACTCTTCAT 

8S1 GAACTCGAAC CACGTTACAA TCTCCAAAAA ATTTCATCAC CAAAGCACTA 

901 AAAGCCAAAG ATGCCTCAAC TTATCCAACT TGGCAGGATA AAGATCTCCA 

951 AAAATGCTTA CTAAAGAACC TAGAATCTTT TCTTTAGAAT TCAATGATCA 

1001 TATAACCATT TCATAACAAT TCTAAATGCC ATTACATTCA TCGTAAAACC 

1051 AGTAAATAAC AAGAACTTGT ATGTTAAGTT CCAATTACCA AGCAAAAAAA 

1101 AACTTTTCAA AGTTTAAAGT TCAAAATGGG AAAGAGAAGT GCGGTGTAAG 

1151 CAAATATGAA AG AGG AAG AG ATGCGAAAAG TGTATCCTAG GACCAGCATT 

1201 TTATACAAAA AAAAAACACT CACTTTTCAG CTCTTAAGGC ATAGAGTGAA 

1251 GGTAGCCATA TGAATTTGGC CACTAGAGCG TCCGTCAAAT CTCATTCTTT 

1301 TTGG ACCAC A TAATGGGTAT CATACATTCA CTGGACCCAA AAGCGTAACT 

1351 GGAGCTAGTC CTCAAACCTA GAGAGTATCG TATCCTGTAG CTTCC AC ATA 

1401 GTAAACATTA TGAGCATAAC AC C AAC AAGG CAACTCCAAG TACTAATGGT 

1451 TATTAGTACA GGAAAACCCA CGATGCTAAA CACATGAATG GGTCACCAAA 

1501 TAG AGTG AAG ATGGTTAAAT TGCATCTATG GATCATGTGG ACTAGTAAAT 

1551 G AGTGT AG C A GAAAACTTCA CAATTACCTC TGTGATCTTA GAAACATGTC 

1601 CTGAAAATTC CATACAAGTG TCCTTTGTAT TAG ATTACTT CCACAGGTTC 
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1GS1 AGATCTAATA AAGCTACAAT 

17 01 ATT AC AG AG A TGTGACAACA 

17 51 ATAATGGTTA CCAAGTGCAA 
1801 TTATTGCTCA TGTGGACTCC 

18 51 CAGGTAAATT CTAATTTGTA 
1901 GATCCTAGAA AATCCACCGT 

19 51 TGATTCTCAT TACCATGACT 
2001 TTCATCAGAA AATCCTTCGA 
2051 GCAGATGCAT TCACAAATAA 
2101 TTGTCTAACC TTTGTATAGG 
2151 TGACACTCGA AGCCTTACTC 
2201 TCAGGATGAA TATTCAACCA 
2251 AACGAATCTA GCTTTACTTT 
2301 TGATACGAAA AAATAGTAAC 
2351 GACCACTTCC TCTTGAGAAT 
2401 ACTTGTGAAC GGGTCCCAAC 
2451 TTTACACTTT ACACTTTACA 
2501 CTAGCTGCAA GAGGATCCAA 
25S1 CGTGTAGGCA CTCAGGATTC 
2 601 AGATATAAGC TTTATTAGCC 
2651 CAAACTTACT AC ATCTG AG T 
2701 AGGTAGTGGA GAGAACAAGA 
27 51 ACGCCTGCAC CGTGTATTGC 
2 801 TGTGATGCTC AAGTTCACTC 

2 851 TGTCCGGGTC TGCGAGTCAT 
2901 AGGCAGATGA TGCCTCTCTA 
2951 GCAAACCCAC TTGCTAGAC G 
3001 TGGAAACTCT TTCAGCTCCA 
30 SI CAATGACCGA TCCAGAGAAG 
3101 GAAGGTCATA AGGATGCCAA 
3151 AGACAAAAAT AACAATAACC 
3201 ATCTAAACCT TG TG G ATT AC 

3 251 GAATACAGTC AACACCKACA 
3 3 01 GGG AGATAGA GTTCTTCCGC 
3351 GCCATAACCA ACACAATTTT 
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AAATAGTATA GAGTATCATC ATAAACCCAA 
CTCATGAGTC ATGTTTTGTA ACTACTTACT 
ATTTCTACAT ACTATATATG ATAAATCTAA 
AAAATGCCTT TTAAGTTTTA ACTTGTGCGT 
GTCTCAAGAC TACTTGGCGG ATTCGAGTTT 
CTCTATGTTT TTCATGTCAC TTTTCCGATA 
TTATGAACCA GATTAAACAT TATAACACTT 
AAGTTTCAAT TGCAAATCTT TCTAAATGAT 
TGGAACAACA ACTATACCAT ATTCACGAGT 
TAGTCAACCC AT AACAG TTG GTGATGGCTC 
GGAGAGATAC CTGAACAGTA ATCACAAGGT 
CTTAAACTTT GTATAAAGCC AAAGAGATAA 
AAATAAAATG CATATGAAAA TAGTAAAAGG 
AATTTGCCTG CAACACCATG GCATTATCCG 
CTCAGTATGG CAAGTGGCAA AACCTAAGCA 
GAAGAAGTGC AT AGG AGG AG ATGTTTACAC 
CTTTACACAT AGGCCTTCCC AAAAGCTCAA 
TAACATGTAA GAGCCACTAA CGCTGTGCCA 
GATCTTCCCC TCTACTTATT CTCTCACACC 
CCTTCTTTCA G AT AC C AGCT CCCACACCAT 
TATTATGTTG AAACAAOAGA GTAACGACAT 
GGGOACGACC CTGTCACACA TCCCGCTCAA 
CATGCAGATT CTGCCTACTT GTGCATGAGC 
TGCCAATCGC GTTCCTTCCC GCCATAAACG 
GTGAGC GTGC TCCGGCTGCT TTTTTGTGTG 
TGCACAGCCT GTGATTCAGA GGTTCATTCT 
CCATCAGCGA GTTCCAATTC TACCAATTTC 
TGACCACTAC TCACCACCAA AGCGAGAAAA 
AGACTGCTGG TGGATCAAGA GGAAGCTGAA 
GG AGG TT GC T TCGTGGCTGT TCCCTAATTC 
AAAACAATGG GTTATTGTTT AGTCATGAGT 
AACTCGAGTA TGGACTACAA ATTCACAGGT 
AAACTGCAGC GTACCACAGA CCAGCTACGG 
TTAAACTTGA AGAATCAAGG GGCCACCAGT 
CAGTTCAATA TCAAATATGG CTCCTCACGG 
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3 10 1 


ACTCACTACA. 


ACOACAATGC 


TTCCATTAAC 


CATAACGTAA 


GGCTTTTGTA 


T A C 1 


T A I 1 l \J IT AC 


CCCTTCAATT 


TAGCATCTTC 


CCaTAACGCA 


GCAGGGTG AA 


J JU1 


TTCTTTCATC 


ATACAC AC AA 


ATCCACTGAT 


CCACTGCCAA 


CAGTTGATCT 


J 3 J i. 


A. 1 ftbLJlL A. I A. 


GAAATTTCAC 


CAGAAGTCTA 


TAATAAAAAC 


AATATATGCT 


3 601 


1 V— X 1 1 l.\_»C_A. 




TTAGTCCTCT 


TACCAGGGGG 


ATTGAGAATG 


3 651 


i. 1 J- i, x 1\_ 


TGTCATT AG G 


CATACATTTC 


ATCCATGCAA 


ACTGGTGTTG 


3701 




AAjCA.GCa.7GT 


GTCACAACAG 


CTTCACACCC 


AAGAACCCCC 


3751 




TAGAGCAAjCA 


AC CTGAC CCT 


GCAACCCAGA 


TCLATAACAGT 


J OUl 


AAGACAACTC 


agtccaatgg 


ACAGAGAAGC 


CAGGGTCCTG 


AGATACAGAG 


3851 


AGAAGAGGAA 


GAGAAGOAAA 


TTTGAGAAGA 


CAATAAGCTA 


TGCTTCGAGG 


3901 


AAGGCATATG 


CAGAGATAAG 


ACCGCGGGTC 


AATGGCCGGT 


TCGCAAAGAG 


39S1 


AGAAATCGAA 


GCCGAGCAGC 


AAGGCTTCAA 


CACGATGCTA 


ATGTACAACA 


4001 


CAGGATATGO 


GATTGTTCCT 


TCATTCTGAT 


ACTCCTGTGG 


CAAAAAGAAA 


40S1 


AACTAGATTG 


caagCtgtaa 


ATTACTTTTA 


GTTTGAGATT 


ATGTTAGGTT 


4101 


TGGTGAAATT 


CTTAGCTTCA 


AGAAGTATTA 


CTACTGTTGT 


GCAAATGGGT 


4151 


TTGTAGTT'lT 


GGCTAATTAA 


AACTATAGTA 


TTCT'lXJTTTC 


TCTGCATTAG 


4201 


T 
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Figure 5 . 

1 ATGTTCAAACAAGAGAGTAACAACATTGGTAGTGAAGAGAACAACACCGGGGCGCGAGCT 
MFKQESNN IGSEENNTGARA 

6 1 TGTGACACATGCGGGTCAACCATCTGCACCGTGTACTGCCATGCTGACTCCGCCTACTTA 
CDTCGST I CTVYCHADSAYL 

121 TGCAATAGCTGCGATGCTCAAGTCCACTCTGCCAATCGCGTTGCTTCCCGCCATAAAAGG 

CNSCDAQVHSANRVASRHKR 

181 GTCAGAGTGTGCGAGTCATGTGAGCGTGCCCCTGCTGCTTTTATGTGTGAGGCAGATGAT 
VRVCESCERAPAAFMCEAD-D 

241 GTGTCTCTATGCACAG C CTGTG ATTCAG AGGTT CACTCCG CAAACCCTCTTG CTAGACGC 
VSLCTACDSEVHSANPLARR 

3 01 CATCAGCGAGTTCCAGTTGTGCCGATAACTGGAAACTCTTGCAGCTCCTTGGCCACCGCT 

HQRVPVVPITGNSCSSLATA 

361 AACCACACAACAGTGACCGAGCCAGAGAAGAGAGTGGTGTTAGTTCAAGAGGATGCCAAA 
NHTTVTEPEKRVVLVQEDAK 

421 GAGACGGCITCATGGTTGTTCCCTAAAAACAGTGACAATCACAACAACAACAACCAGAAC 
ETASWLFPKNSDNHNNNNQN 

4 81 AATGAGTTGTTGTTTAGTGATGACTATCTAGACCTTGCTGATTACAACTCGAGTATGGAC 

NELLFSDDYLDLADYNSSMD 

54 1 TACAAGTTCACTGGTCAATACAATCAACCTACTCAACATAAACAAGACTGCACCGTACCA 
YKFTGQYNQPTQHKQDCTVP 

601 GAGAAAAACTACGGTGGAGATAGAGTTGTTCCACTCCAACTTGAAGAAACAAGAGGAAAC 
EKNYGGDRVVPLQLEETRGN 

661 TTGCACCACAAGCAACATAATATCACGTATGGCTCCTCAGGAAGTCACTACAACAACAAT 
LHHKQHNITYGSSGSHYNNN 

721 GGTTCCATAAACCATAACGCATACAATCCATCAATGGAAACTGACTTTGTTCCGGAGCAG 
GSINHNAYNPSMETDFVPEQ 

781 ACAGCACCTGACAAAACAGTTTCACATCCAAAAACGCACAAAGGGAAGATAGAGAAACTA 

TAPDKTVSHPKTHKGKIEKL 

841 CCTGAACCTCTAATTCAGATTCTCAGTCCAATGGACAGAGAAGCTAGAGTCCTGAGATAC 
PEPL IQ I LS PMDREARVLRY 

901 AGAGAGAAGAAGAAGAGAAGAAAGTTTGAGAAG ACAATAAGGTATGCTTCAAGGAAGGCA 
REKKKRRKFEKTIRYASRKA 

961 TATGCAGAGAGAAGACCGAGGATCAATGGACGGTTTGCAAAGATTAGTGAAACCGAAGTA 

YAERRPRINGRFAKISETEV 

10 21 GAGGACCAAGAGTACAACACAATGCTAATGTACTATGACACAGGATATGGCATTGTTCCT 

EDQEYNTMLMYYDTGYGIVP 
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1 ATGTTCAAACAAGAGAGTAAGAACATTTGTAATAGAGA 
MFKQESNNICNRENNRGARA 

6 1 TGTGACACATGCGGGTCAACCATCTGCACCGTGTACTGCCATGCT 

CDTCGSTI CTVYCHADSAY-L 

121 TG CAAT AG CTG CGATG CTCAAGT C CACT CTG C CAAT CG CGTTG CTTCCCGCCATAAACG T 
CNSCDAQVHSANRVASRHKR 

181 GTCCGGGTCTGCGAGTCATGTGAGCGTGCCCCTGCTGCTTT 

VRVCES CERAPAAFMCEADD 

2 41 GTGTCTCTATGCAGAGCCTGTGATTTAGAGGTTCACTCCGCAAACCCT 

VSLCTACDLEVHSANPLARR 

3 01 CATCAG CGAGTT CCAGTTGTG CCG ATAATTGGAAACTCrTTG GAG CTCCTT 

HQRVPVVPI IGNSCSSLATA 
361 

AACCACACAACAGTGACCGAGCCAGAGAAGAGAGTGGTGTTAGTTCAAGAGGATGCCAAA 
NHTTVTEP E KRVVLVQEDAK 

4 21 GAGACGG CTTCATGGTTGTT C CCTAAAAACAGTG ACTAT CACAACAACAACAACAACCAG 

ETASWL FP KNS DYHNNNNNQ 

4 81 AACAATG AGTTGTTGTTTAGTGATG ACT ACCT AGAC CTTG CTG ATTACAACT CCAGT ATG 
NNELLFSDDYLDLADYNSSM 

541 GACTACAAGTTGACCAGTCAATACAATGAACCT 

DYKFTSQYNQPRHKQDCIVP 

601 GAGAAAAACT ACAGTGGAGATAGAGTTGTTCCG CT C CAACTTGAAG AAACAAGAGGAAAC 

EKNYSGDRVVPLQLEETRGN 

6 61 TTG CGG AACAAGCAACAQAAT AT CACAT ATGG CT C CTCAGG AAGC CAATACAACAACAAC 

LRNKQQNITYGSSGSQYNNN 

7 21 GGTTCCATTAACCATAACGCATACAATCCATCAATGGAAACTGACTTTGTGCC 

GSINHNAYNPSMETDFVPEQ 

7 81 AGAGCACCTGACACAACAGTTTCACATCGAAA 

TAPDTTVSHPKTHKGKTAQL 

841 C CTG AACCT CT AATTCAG ATT CT C AGT CC AATGG A CAG AG AAG CT AGAGTC CTGAG AT AC 
PEPLIQILSPMDREARVLRY 

9 01 AGAGAGAAGAAGAAGAGAAGAAAGTTTGAGAAGACAATAAGGTATGCTTCAAGGAAGGCA 
REKKKRRKFEKTI RYASRKA 

9 6 1 TATGCAGAGAGAAGACCGAGGATAAATGGACGGTTTGCAAAG ATGAGTGAAACCGAAGTA 

VAERRPRINGRFAKMSETEV 
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10 21 G AGG AC CAAG AGT ACAA CACAATGCTAATGT A CTGCG A CACAGG AT ATGGCATTGTTCCT 
EDQEYNTMLMYCDTGYGIVP 

1081 TCATTCTATGGCCAAAAATAA 
S F Y G Q K * 
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